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1 Introduction 

1.1 Huge networks 

In the last decade it became apparent that a large number of the most interesting structures and 
phenomena of the world can be described by networks: separable elements, with connections (or 
interactions) between certain pairs of them. 

• Among such a networks, the best known and the most studied is the internet. Moreover, 
the internet (as the physical underlying network) gives rise to many of the networks: the 
network of hyperlinks (web, logical Internet), Internet based social networks, distributed 
data bases, etc. The size of the internet is growing fast: currently the number of web pages 
may be 30 billion or more, and the number of devices is probably more than a billion. 

• Social networks are basic objects of many studies in the area of sociology, history, epidemi- 
ology and economics. The largest social network is the acquaintance graph of all living 
people, with about 7 billion nodes. 

• Biology contributes ecological networks, networks of interactions between proteins, and the 
human brain, just to mention a few. The human brain is really large for its mass, having 
about 10^^ nodes. 

• Statistical physics studies the interactions between large numbers of discrete particles, 
where the underlying structure is often described by a graph. For example, a crystal can 
be though of as a graph whose nodes are the atoms and whose edges represent chemical 
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bonds. A perfect crystal is a rather boring graph, but impurities and imperfections create 
interesting graph-theoretical digressions. 12 gram of a diamond has about 6 x 10^^ nodes. 

• Some of the largest networks in engineering occur in chip design. Even though these 

networks are man-made and planned, many of their properties are difficult to determine 
by computation due to their huge size. There can be more than a billion transistors on a 
chip now. 

• To be pretentious, we can say that the whole universe is a single (really huge, possibly 
infinite) network, where the nodes are events (interactions between elementary particles), 
and the edges are the particles themselves. This is a network with perhaps 10^° nodes. 

These huge networks pose exciting challenges for the mathematician. Graph Theory (the 
mathematical theory of networks) has been one of the fastest developing areas of mathematics 
in the last decades; with the appearance of the Internet, however, it faces fairly novel, uncon- 
ventional problems. In traditional graph theoretical problems the whole graph is exactly given, 
and we are looking for relationships between its parameters or efficient algorithms for computing 
its parameters. On the other hand, very large networks (like the Internet) are never completely 
known, in most cases they are not even well defined. Data about them can be collected only 
by indirect means like random local sampling or by monitoring the behavior of various global 
processes. 

Dense networks (in which a node is adjacent to a positive percent of others nodes) and sparse 
networks (in which a node has a bounded number of neighbors) show a very diverse behavior. 
From a practical point of view, sparse networks are more important, but at present we have 
more complete theoretical results for dense networks. 

1.2 What to ask about them? 

Let us discuss three possible questions that can be asked about a really large graph, say the 
internet. 

Question 1. Does the graph have an odd number of nodes? 

This is a very basic property of a graph in the classical setting. For example, it is one of 
the first theorems or exercises in a graph theory course that every graph with an odd number of 

nodes has a node with even degree. 

But for the internet, this question is clearly nonsense. Not only does the number of nodes 
change all the time, with devices going online and offiine, but even if we fix a specific time like 
12:00am today, it is not well-defined: there will be computers just in the process of booting up, 
breaking down etc. 

Question 2. What is the average degree of nodes? 

This, on the other hand, is a meaningful question. Of course, the average degree can only be 
determined with a certain error, and it will change with technology or the social composition of 
users; but at a given time, a good approximation can be sought (I am not speaking now about 
how to find it). 
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Question 3. Is the graph connected? 

To this question, the answer is almost certainly no: somewhere there will be a faulty router 
with some unhappy users on the wrong side of it. But this is not the interesting way to ask 
the question: we should consider the internet disconnected if, say, an earthquake combined 
with a sunflare severs all connections between the Old and New worlds. So we want to ignore 
small components that are negligible with respect to the whole graph, and consider the graph 
disconnected only if it decomposes into two parts which are commeasurable with the whole. On 
the other hand, we may want to allow that the two parts be connected by a few edges, and still 
consider the graph disconnected. 

Question 4. Find the largest cut in the graph. 

(This means to find the partition of the nodes into two classes so as to maximize the number of 
edges connecting the two classes.) This example shows that even if the question is meaningful, it 
is not clear in what form can we expect the answer. The fraction of edges contained in the largest 
cut can be determined relatively easily (with and error that is small with large probability); but 
how to specify the largest cut itself (or even an approximate version of it)? 

1.3 How to obtain information about them? 

If we face a large network (think of the internet) the first challenge is to obtain information 
about it. Often, we don't even know the number of nodes. 

1.3.1 Local sampling 

Properties of very large graphs can be studied by sampling small subgraphs. The theory of this, 
called property testing in computer science, emerged in the last decade, and will be one of the 
main concerns of this paper. 

In the case of dense graphs G, the sampling process is simple: we select independently a fixed 
number k of random nodes, and determine the edges between them, to get a random induced 
subgraph. We'll call this subgraph sampling. For each graph F, this defines a probability of 
seeing F when |V^(F)| nodes are sampled, and so it gives a probability distribution cra.k on 
all graphs with k (labeled) nodes. It turns out that this sample contains enough information 
to determine many properties and parameters of the graph (with an error that is with large 
probability arbitrarily small if k is sufficiently large depending only on the error bound) . 

To get a mathematically exact description of algorithms for very large graphs, we define a 
subgraph sampling oracle as a black box that, for a given positive integer m, returns a random m- 
node graph from some (otherwise unknown) distribution. We think of this as a random induced 
subgraph of a very large, otherwise unknown graph G. We assume that the oracle is consistent 
in the sense that for any k there is a graph G such that the distribution of the fc-samples from 
G is arbitrarily close to the distribution of the answers by the oracle. (Theorem 16.131 will give a 
characterization of consistent distributions.) 

In the case of sparse graphs with bounded degree, the subgraph sampling method gives a 
trivial result: the sampled subgraph will almost certainly be edgeless. Probably the most natural 
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way to fix this is to consider neighborhood sampling. Let Gd denote the class of finite graphs with 
all degrees bounded by d. For G & Gd, select a random node and explore its neighborhood to a 
given depth m. This provides a probability distribution pG,m on graphs in Gd, with a specified 
root node, such that all nodes are at distance at most m from the root. We will shortly refer to 
these rooted graphs as m-balls. Note that the number of possible m-balls is finite if d and m are 
fixed. We can formulate this abstractly as a neighborhood sampling oracle, a black box that, for 
a given positive integer m, returns an m-ball. 

The situation for sparse graphs is, however, less satisfactory than for dense graphs, for two 
reasons. First, a full characterization of consistent neighborhood sampling oracles is not known 
(cf. Coniecture l7.2p . Second, neighborhood sampling does not reveal important global properties 
of the graph like expansion. This suggests looking at further possibilities. Suppose, for example, 
that instead of exploring the neighborhood of a single random node, we could select two (or more) 
random nodes and determine simple quantities associated with them, like pairwise distances, 
maximum flow, electrical resistance, hitting times of random walks. What information can be 
gained by such tests? Is there a "complete" set of tests that would give enough information to 
determine the global structure of the graph to a reasonable accuracy? These methods should 
lead to different theories of large graphs and their limit objects, largely unexplored. 

Sample distribution (in both the dense and sparse cases) are equivalent to counting induces 
subgraphs of a given type. Instead of this, we could count homomorphism (or injective homo- 
morphisms) of a "small" graphs into the graph. The connection with sample distribution can be 
expressed by inclusion-exclusion formulas, and it is not essential. Often homomorphism numbers 
are algebraically better behaved, and they also have the advantage that they suggest different, 
"dual" approaches by reversing the arrows in the category of graph homomorphisms. 

1.3.2 Observing global processes 

Another source of information about a network is the observation of the behavior of various 
global processes either globally (through measuring some global parameter), or locally (at one 
node, or a few neighboring nodes, but for a longer time). Statistical physical models on the 
graph are examples of the first kind of approach (we return to them in section [2. 3. 3p . Crawlers 
can be considered as examples of the second, and there are some sporadic results about the local 
observation of simpler, random processes [131 [TJ]. A general theory of such local observation has 
not emerged yet though. 

1.3.3 Left and right homomorphisms 

Instead of testing, it is often more convenient to talk about homomorphisms (adjacency- 
preserving maps) between graphs. This leads to the following setup. If we are given a (large) 
graph G, we may try to study its local structure by counting homomorphisms from various 
"small" graphs F into G; and we can study its global structure by counting its homomorphisms 
into various small graphs H. The first type of information is closely related (in many cases, equiv- 
alent) to sampling, while the second is related to statistical physics. As in statistical physics, 
one needs weighted graphs H here to get meaningful results. 
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1.4 How to model them? 

1.4.1 Random graphs 

We are celebrating the 50-th birthday of random graphs this year: The simplest random graph 
model was developed by Erdos and Renyi [44 and Gilbert j55] in 1959. Given a positive integer 
n and a real number < p < 1, we generate a random graph G{n,p) by taking n nodes, say 
[n] = {1, . . . ,n}, and connecting any two of them with probability p, making an independent 
decision about every pair. 

There are alternate models, essentially equivalent: we could fix the number of edges m, and 
then choose a random m-element subset of the set of pairs in [n] , uniformly from all such subsets. 
This random graph m) is very similar to Gr{n,p) when m — p{^) ■ Another model, closer to 
some of the more recent developments, is evolving random graphs, where edges are added one by 
one, always choosing uniformly from the set of unconnected pairs. Stopping this process after 
m steps, we get G(n, m). 

Erdos-Renyi random graphs have many interesting, often surprising properties, and a huge 
literature, see |20[ 168] . One conventional wisdom about random graphs with a given edge density 
is that they are all alike. Their basic parameters, like chromatic number, maximum clique, 
triangle density, spectra etc. are highly concentrated. This fact will be an important motivation 
when defining the right measure of global similarity of graphs. 

Many generalizations of this random graph model have been studied. For example, one 
could have different probabilities assigned to different edges. A variation of this idea, discovered 
independently in [85], |22j and perhaps elsewhere, is the notion of ly-random graphs, to be 
discussed in section [3.1.2l and used throughout these notes. 

1.4.2 Randomly growing graphs 

Random graph models on a fixed set of nodes, discussed above, fail to reproduce important 
properties of real-life networks. For example, the degrees of Erdos-Renyi random graphs follow 
a binomial distribution, and so they are asymptotically normal if the edge probability p is a 
constant, and asymptotically Poisson if the expected degree is constant (i.e., p — p{n) ^ c/n). 
In either case, the degrees are highly concentrated around the mean, while the degrees of real 
life networks tend to obey the "Zipf phenomenon" , which means that the tail of the distribution 
decreases according to a power law. 

In 2002 Albert and Barabasi [11 [13] created a random network model growing according to 
natural rules, which could reproduce this behavior. Since then a lot of variations of growing 
networks were introduced. The process of graph generation usually consists of random steps 
obeying some local rules. 

This is perhaps the first point which suggests one of our main tools, namely assigning limits 
to sequences of graphs. Just as the Law of Large Numbers tells us that adding up more and 
more independent random variables we get an increasingly deterministically behaving number, 
these growing graph sequences tend to have a well-defined structure, independent of the random 
choices made along the way. In the limit, the randomness disappears, and the asymptotic 
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behavior of the sequence can be described by a well-defined limit object. You will find more on 
this in Sections 11.5.31 and 16.51 

1.4.3 Quasirandom graphs 

The theory of quasirandom graphs, introduced by Thomason |117j and Chung, Graham and 
Wilson [33j . is based on the following observation: not only have random graphs a variety of 
quite strict properties (with large probability), but for several of these basic properties, the 
exceptional graphs are the same. In other words, any of these properties implies the others, 
regardless of any stochastic consideration. 

To make this idea precise, we consider a sequence of graphs {Gn) with |y(G„)| oo. For 
simplicity, assume that |V(G'„)| = n. Let < p < 1 be a real number. Consider the following 
properties of these graphs. 

(PI) All degrees are asymptotically pn and all codegrees (numbers of common neighbors of 
two nodes) are asymptotically p^n. 

(P2) For every fixed graph F, the number of homomorphisms of F into G„ is asymptotically 

p\E(F)\JV{F}\^ 

(P3) The number of edges is asymptotically and the number of 4-cycles is asymptoti- 

cally p'^n*/8. 

(P4) The number of edges induced by a set of nodes of size an is asymptotically 

All these properties hold with probability 1 if Gn = G(n, p) . However, more is true: if a graph 
sequence satisfies either one of them, then it satisfies all [33]. Such graph sequences are called 
quasirandom. The four properties above are only a sampler; there are many other random-like 
properties that are also equivalent to these [33l I108| I109j . 

Many interesting deterministic graph sequences are quasirandom. We mention an important 
example from number theory: 

Example 1.1 Paley graphs. Let p„ be the n-th prime congruent 1 modulo 4, and let us define 
a graph on {1, . . . ,p„} by connecting i and j if and only if i — j is a quadratic residue. The 
Paley graphs converge to the function W = 1/2. 

The theory of convergent graph sequences (Section [6]) can be considered as a rather far- 
reaching generalization of quasirandom sequences. 

1.5 How to approximate them? 

We want a compact approximate description of a very large network, usually in the form a 
(relatively) small networks or at least a network with a compact description. To make this 
mathematically precise, we need to define what we mean by two graphs to be "similar" or 
"close" , and describe what kind of structures we use for approximation. 
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1.5.1 The distance of two graphs 

There are many ways of defining tlie distance of two graphs G and G' . Suppose that the two 
graphs have a common node set [n]. Then a natural notion of distance is the edit distance^ 
defined as the number of edges to be changed to get from one graph to the other. Since our 
graphs are very large, we want to normalize this, and define 



While this distance plays an important role in the study of testable graph properties, it does not 
reflect structural similarity well. To raise one objection, consider two random graphs on [n] with 
edge-density 1/2. As mentioned in the introduction, these graphs are very similar from almost 
every aspect, but their normalized edit distance is large (about 1/2 with large probability). One 
might try to improve this by relabeling one of them to get the best overly minimizing the edit 
distance; but the improvement would be marginal (o(l)). 

Another trouble with the notion of edit distance is that it is only defined when the two graphs 
have the same number of nodes. 

We could base the measurement of distance on sampling. We define the sampling distance of 
two graphs G and G' by 



(where dtv{a,(3) = svvpx Wi^) ^ denotes the total variation distance of the distributions 

a and 0). Here the coefficients 1/2*^ are quite arbitrary, only to make the sum convergent. This 
distance, however, would not directly reflect any structural similarity. 

In section we will define a further distance between graphs, which will be satisfactory from 
all these points of view: it will be defined for two graphs with possibly different number of nodes, 
the distance of two random graphs with the same edge density will be very small, and it will 
reflect global structural similarity. It will define the same topology as dgampie- 

The construction of the sampling distance can be carried over to bounded degree graphs, 
by replacing in ^ the sampling distributions UG,k by the neighborhood distributions PG,k- We 
must point out, however, that it seems to be difficult to define a notion of distance between two 
graphs with bounded degree reflecting global similarity. 

1.5.2 Approximation by smaller: Regularity Lemma 

As the exact description of huge networks is not known, and they are too big for direct study 
(e.g., for testing different algorithms or protocols directly on the whole internet), an important 
operation would be to "scale down" by producing a smaller network with similar properties. The 
main tool for doing so is the "Szemeredi-partition" or "regularity Lemma" . Szemeredi developed 
his Regularity Lemma for his celebrated proof of the Erdos-Turan Conjecture on arithmetic pro- 
gressions in dense sets of integers in 1975. Since then, the Lemma has emerged as a fundamental 
tool in graph theory, with many applications in extremal graph theory, combinatorial number 
theory, graph property testing etc., and became a true focus of research in the past years. 



di(G,G') 



\E[G)^E{G')\ 





QO 
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This lemma can be viewed as an archetypal example of dichotomy between randomness and 
structure, where we try to decompose a (large and complicated) object A into a more highly 
structured object A' with a (quasi)random perturbation (cf. Tao |116| ). The highly structured 
part may be easier to handle, the quasirandom part will often be simpler due to Laws of Large 
Numbers. We'll introduce this partition in section [5] (and use it throughout). 

Finding the Szemeredi partition of a huge dense graph is an example of the problem posed 
in Question 4 in Section 11.21 Algorithm 15.4.21 will be an example of a possible solution: how a 
partition of the nodes can be determined in an implicit form, even if describing for each node 
which class it belongs to would take too much space. 

1.5.3 Approximation by infinite: convergence and limits 

This idea can be motivated by how we look at a large piece of metal. This is a crystal, that is a 
really large graph consisting of atoms and bonds between them. But from many points of view 
(e.g., the use of the metal in building a bridge), it is more useful to consider it as a continuum 
with a few important parameters (density, elasticity etc.). Its behavior is governed by differential 
equations. Can we consider a more general very large graph as some kind of continuum? 

One way to make this intuition precise is to consider a growing sequence ((?„) of graphs 
whose number of nodes tends to infinity, and to define when such a sequence is convergent. 
(We have mentioned this idea in connection with randomly growing graphs, but now we don't 
assume anything about how the graphs in the sequence are obtained.) Our discussion of sampling 
suggests a general principle leading to a definition: we consider samples of a fixed size k from Gn, 
and their distribution. We say that the sequence is locally convergent (with respect to the given 
sampling method) if this distribution tends to a limit as n — + cxd for every fixed k. The family 
of limiting distributions (one for each fc) can be considered as a limit object of the sequence. 

For dense graphs, this notion of convergence was suggested by Erdos, Lovasz and Spencer 
[43] . and elaborated by Borgs, Chayes, Lovasz, Sos, Szegedy and Vesztergombi [28l |29l [30] . For 
sparse graphs, this kind of convergence was introduced by Aldous [1] and by Benjamini and 
Schramm [16]. These notions will be discussed in Sections [6T] and [7Tl respectively. 

The definition above represents the limit of a graph sequence as a collection of probability 
distributions on graphs, one for each sample size. This is not always a helpful representation 
of the limit object, and a more explicit description is desirable. A next step is to represent 
the family of distributions on finite graphs (the samples) by a single probability distribution 
on countable graphs. For sparse graphs, Benjamini and Schramm provide such a description as 
certain measures on countable rooted graphs with bounded degree (see section IX2l and a similar 
description for dense graph limits is also known as certain ergodic measures on countable graphs 
( [TTT] : see Theorem [6T3]). 

More explicit descriptions of these limit objects can also be given. Let us start with the dense 
case. Here the limit object can be described as a two- variable measurable function W : [0, 1]^ ^ 
[0,1], called a graphon (Lovasz and Szegedy [85]; see Section [3T]) . These limit objects can be 
considered as weighted graphs with a continuum underlying set, or (if you wish) as graphs on a 
nonstandard model of the unit interval. 
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Let us describe an example here; more to follow in Section 16.5.21 The picture on the left 
hand side of Figure [T] is the adjacency matrix of a graph G with 100 nodes, where the I's are 
represented by black squares and the O's, by white squares. The graph itself is constructed by 
a simple randomized growing rule: Starting with a single node, we either add a new node or a 
new edge; a new node is born with probability 1/n, where n is the current number of nodes. (A 
closely related graph sequence (randomly grown uniform attachment graphs) will be discussed 
in detail in Section fG. 5. 21 ) 



The picture on the right hand side is a grayscale image of the function U{x,y) = 1 — max(a;, y). 
The similarity with the picture on the left is apparent; and suggests that the limit of the graph 
sequence on the left is this function. This turns out to be the case in a well defined sense. It 
follows that to approximately compute various parameters of the graph on the left hand side, 
we can compute related parameters of the function on the right hand side. For example, the 
triangle density of the graph on the left tends (as ti — > oo) to the integral 



Two more remarks on the dense case. Of course, a graphon can be infinitely complicated. 
But in many cases limits of growing graph sequences have a limit graphon that is a continuous 
function described by a simple formula (see a couple of examples in Section I6.5.2p . Such a limit 
graphon provides a very useful approximation of a large dense graph. 

Instead of the interval [0, 1] , we can consider any probability space (f2. A, tt) with a symmetric 
measurable function W : fl x fl [0,1]. This would not give a greater generality, but it is 
sometimes useful to represent the limit object by other probability spaces. We'll see an example 
of this in Section fG. 5. 21 

In the sparse case, the limit object can be described as a graphing (known from group theory or 
ergodic theory, Elek [37]), or as a measure preserving graph (see Section [3.2[) . or as a distribution 
on rooted countable graphs with special properties. 

Instead of sampling, we can use dual (global) measurements, more precisely, homomorphisms 
into fixed small graphs, to define convergence. The remarkable fact is that under the right 
conditions, this leads to an equivalent notion! (See Sections 16. 6[ 17.31 ) 




Figure 1: A randomly grown uniform attachment graph with 100 nodes 
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1.5.4 Optimization problems for graphs 

We have presented the theory of convergent graph sequences and their hmits as an answer to 
problems coming from very large networks, but a very strong motivation comes from extremal 
graph theory. 

Consider the following two optimization problems. 

Classical optimization problem. Find the minimum of — Qx where a; is a nonnegative 
real number. 

Graph optimization problem. Find the minimum of t(C4,G) over all graphs G with 
t{K2,G) > 1/2. (Here t(F,G), the homomorphism density of F in G, denotes the probability 
that a random map of V{F) into V{G) preserves the edges. C4 denotes the 4-cycle and K2 is 
the complete graph with 2 nodes.) 

The solution of the classical optimization problem is of course x = \/2. This means that it 
has no solution over the rationals, but we can find rational numbers that are arbitrarily close 
to being optimal. If we want a single solution, we have to go to the completion of the rationals, 
i.e., to the reals. 

The graph optimization problem may take a bit more effort to solve, but it is not hard to 
show that if the edge-density is 1/2, then the 4-cycle density is larger than 1/16. Furthermore, 
this density gets arbitrarily close to 1/16 for appropriate families of graphs: the most important 
example is a random graph with edge-density 1/2 (cf. also Section fl. 4. 31 and Theorem 19. 5p . 

This suggests that we could try to enlarge the set of (finite) graphs with new objects so that 
the appropriate extension of our optimization problem has a solution among the new objects. 
Furthermore, we want that these new objects should be approximablc by graphs, just like real 
numbers are approximable by rationals. 

Many of the basic tools in the theory of very large graphs have been first applied in extremal 
graph theory: the Regularity Lemma [113j, convergent graph sequences |43j . quasirandom graphs 

[ml [33]. 

The example above shows that limit objects may provide cleaner formulations of extremal 
graph theory results, with no error terms. In some cases this goes further, and the limit objects 
provide a way to state, in an exact way, questions like "How do extremal graphs look like?". 
They have similar uses in the theory of computing. We discuss these applications in Sections [8] 
andU 

1.6 Mathematical tools 

It is clear from the above that this area is at the crossroads of different fields of mathematics. 
Graph theory and computer science are the main sources, and probability and mathematical 
statistics are crucial tools. Group theory, in particular finitely generated groups, have provided 
many of the questions and ideas in the theory of limits of graphs with bounded degree. Ergodic 
theory may play a similar role in the dense case. Measure theory is needed, and an important 
new general proof method uses nonstandard analysis. 
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We will discuss one further tool, namely Frobenius algebras, which are used in the proofs of 
characterization theorems of homomorphism functions, but also in some other studies of graph 
parameters; see Section \TM 

2 Graph parameters 

A graph parameter is a real valued function defined on isomorphism types of graphs (including the 
graph Ko with no nodes and edges) . A simple graph parameter is defined only on isomorphism 
types of simple graphs (i.e., on graphs with no loops or multiple edges). A graph parameter / 
is multiplicative if /(G) = f{Gi)f{G2) whenever G is the disjoint union of Gi and G2. We say 
that a graph parameter is normalized if its value on A'l , the graph with one node and no edge, 
is 1. Note that if a graph parameter is multiplicative and not identically 0, then its value on ATq 
(the graph with no nodes and no edges) is 1. 

2.1 Connection matrices and reflection positivity 

A k-laheled graph is a graph in which k of the nodes are labeled by 1, . . . , fc (there may be any 
number of unlabeled nodes). A 0-labeled graph is just an unlabeled graph. 

Let Fi and F2 be two /c-labeled graphs. We define the /c-labeled graph F1F2 by taking their 
disjoint union, and then identifying nodes with the same label. Clearly this multiplication is 
associative and commutative. For two 0-labeled graphs, F1F2 is their disjoint union. 

Let / be any graph parameter and fix an integer fc > 0. We define the fc-th connection matrix 
of the graph parameter / as the (infinite) symmetric matrix M{f,k), whose rows and columns 
are indexed by (isomorphism types of) fc-labeled graphs, and the entry in the intersection of the 
row corresponding to Fi and the column corresponding to F2 is f{FiF2). 

We call the graph parameter reflection positive if all the corresponding connection matrices 
are positive semidefinite. 

2.2 Homomorphisms from the left 
2.2.1 Versions of homomorphism numbers 

For two finite graphs F and G, let hom(i^, G) denote the number of homomorphisms of F 
into G (adjacency- preserving maps from V{F) to V{G)), inj(F, G), the number of injective 
homomorphisms of F into G, and ind(F, G), the number of embedding of F into G as an induced 
subgraph. 

These quantities are closely related: 

inj(i^,G)- ^ ind(F',G), 

F'DF 

where F' ranges over all graphs obtained from F by adding edges, and 
hom(F,G) = ^inj(F",G), 

F" 
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where F" ranges over all graphs obtained from F by identifying nodes. Conversely, ind can be 
expressed by inj, which in turn can be expressed by horn using inclusion-exclusion. 

This definition can be extended to the case when G has nodeweights and edgeweights Puv '■ 

hom{F,G)= ^ Yl '^■p{u)iG) J| /3^(«),v(t,) (G). 

ip: V(F)^V{G)ueV{F) uveE{F) 

We often normalize these homomorphism numbers, and consider the homomorphism densities 
_ hom(j^, G) 

^ ' ' \viG)\\y(p)\' 

which is the probability that a random map of V{F) into V{G) is a homomorphism. We can 
define similarly 



iinj(F,G)-t(F,G) =0(^^^). (6) 



inj(f,G) 
n{n — 1) • • • (n — A: + 1) 

and 

f fipr^^^ ind(F,G') 

n[n — 1) • • • (n — K + 1) 

We have 

tinj(F,G)= iindiF',G) (4) 

F'DF 

and the inversion formula 

U^^iF,G) = J2 (-l)l^(^''\^("^^ltinj(^^',G). (5) 

F'DF 

For hom and inj the relationship is not so simple due to the different normalization, but recalling 
that we are interested in large graphs G, the following fact is usually enough to go between them: 

1 

\V{G)\' 

We note that tind{F, G) is the probability that sampling V{F) nodes of G, we see the graph 
F. So it follows that (for very large graphs, up to the error in ([6])) subgraph sampling provides 
the same information as any of the homomorphism densities t, tinj, iind- 

2.2.2 Spectra 

Homomorphisms of "small" graphs into G are related to sampling, as mentioned earlier. There 
are less obvious applications of these numbers. 

Example 2.1 If Gk denote the cycle on k nodes, then hom(Gfc, G) is the trace of the fc-th power 
of the adjacency matrix of the graph G. In other words, 

n 

hom(Gfe,G)=^A^ 

i=l 

where Ai, . . . , A„ are the eigenvalues of the adjacency matrix of G. From here, eigenvalues with 
large absolute value can be recovered. For example, hom(G2A:, G)^/'^''-' tends to the largest 
eigenvalue of G as k oo. 
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2.3 Homomorphisms to the right 

2.3.1 Colorings and independent sets 

Several important graph parameters can be expressed in terms of homomorpliisms into fixed 
"small" graphs. 

Example 2.2 If Kq denotes the complete graph with q nodes (no loops), then hom(G, JCg) is 
the number of colorings of the graph G with q colors, satisfying the usual condition that adjacent 
nodes must get different colors. 

Example 2.3 Let H be obtained from K2 by adding a loop at one of the nodes. Then 
hom(G, H) is the number of independent sets of nodes in G. 

2.3.2 Multicuts 

An important graph parameter is the maximum cut Maxcut(G), the maximum number of edges 
between a set S' C V{G) of nodes and its complement. While finding minimum cuts is perhaps 
more natural, the maximum cut problem comes up when we want to approximate general graphs 
by bipartite graphs, in computing ground states in statistical physics (see next section), and in 
many other applications. For our purposes, it will be more convenient to consider the normalized 
maximum cut, defined by 

Maxcut(G)) eG{S,V\S) 
maxcut(G) = = max 

(here eG{X,Y) denotes the number of edges in G connecting node sets X and Y). 

The following easy fact relates maximum cuts and homomorphism numbers. Let H be the 
weighted graph on {1,2} with nodeweights and cdgeweights 1 except for the non-loop edge, 
which has weight 2. Then we have the trivial inequalities 

2Maxcut(G-) < hom(G',if) < 2l^('^)l2'^^«"*(<^\ 
which upon taking the logarithm and dividing by |y(G)p becomes 



™.(0)<1— Ji2<™t,0) + ^. (7) 



So the homomorphism number into this simple 2-node graph determines maxcut(G) asymptoti- 
cally. 

An important extension of the maximum cut problem involves partitions into g > 1 classes 
instead of 2. Instead of just counting edges between different classes, we specify in advance 
numbers f3ij € [q]) such that /3y = /3ji. We define the maximum multicut density (with the 
target weights /J^) as 

mmcut(G,/3) = max y^^^^yp- ^ /3y eG(5'i, S'j), 
where the maximum is taken over all partitions {Si, . . . , Sq} of V{G). 
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A further important extension is to fix the proportion into which the cut separates the node 
set. For example, the "maximum bisection problem" asks for the maximum size of a cut that 
separates the nodes into two equal parts (we allow a difference of 1 if the number of nodes is 
even). More precisely, we can formulate the restricted multicut problem as follows. We specify 
(in addition to the f3ij) numbers ai, . . . ,aq > with ai + • • ■ + = 1. It is convenient to 
consider the parameters and as the nodeweights and edge weights of a weighted graph H 
with V{H) = [q\. Then are interested in 

£:(G,i/)=max^^^^^A,eG(5.,^,), (8) 

where {Si, . . . , Sq} ranges over all partitions of V{G) such that 

11^,1 -a,|y(G)|| <1 = (9) 

(This can be defined for all graphs H with positive nodeweights, by scaling the nodeweights so 
that they sum to 1.) 

The following extension of ([7]) is easy to prove: for H fixed and |T^(G)| oo, 

logo hom(G, H) , ^ , 1 s , ^ 

WiGW + (10) 

(Note that log2 hom(G, H)/\V{G)\'^ is asymptotically independent of the node weights of H.) 

The restricted maximum multicut problem is also related to counting homomorphisms, but 
the relationship is a little more complicated. Let G be a (very large) simple graph and H, 
a weighted graph with V(H)=[q]. In the definition of t{G,H) we considered random maps 
V{G) — > V{H), where the image of each node is chosen independently from the distribution on 
V{H) defined by the node weights. For most of these random maps |(/3^^(«)| k, ai{H)\V {G)\\ 
by the law of large numbers. It turns out that often it is advantageous to restrict ourselves to 
maps that are "typical" in this sense. More precisely, let S{G,H) denote the set of those maps 
if : V{G) V{H) for which ||(p"i(i)| - a,|y(G)|| < 1 for all i G V{H). Using this notation, 
we can write 

rmcut(G,i7)= max P^(u)Mv)- 

^ ' u,veV{G) 

Let H be the weighted graph in which the edge weights are (3ij = exp(/3jj) instead of (3ij. If we 
define 

hom*{G,H)= ^ Y[ PvHMv)^ 

ipeS{G,H) uveE{G) 

then the following inequality analogous to (fTUl) holds for |T^(G)| oo: 



log horn* (G,g) 1 

|y(G)p ^ V(G)r 



rmcut(G, H) = ' + 0(^777^)- (11) 
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2.3.3 Statistical physics 

Graph homomorphism functions can be used to express partition functions of various statistical 
physical models. Two basic types of such models are "hard-core" and "soft-core" . 

To describe an example of a hard-core model, let G be an n x n grid, and suppose that every 
node of G (every "site") can be in one of two states, "UP" or "DOWN". The properties of the 
system are such that no two adjacent sites can be "UP". A "configuration" is a valid assignment 
of states to each node. The number of configurations is the number of independent sets of nodes 
in G, which in turn can be expressed as the number of homomorphisms of G into the graph H 
consisting of two nodes, "UP" and "DOWN" , connected by an edge, and with an additional loop 
at "DOWN". 

In a soft-core spin model the sites are again nodes of a graph G, which can be in one of q 
possible states. For any two states i and j, we specify an "energy of interaction" in the form of a 
real number Jij. A given configuration (assignment of states) is given by a map ip : V{G) — > [q], 
and its "energy density" is expressed as 

^ \V(G)\^ ^ Jf{u),ip{v), (12) 

' ^ uveE{G) 

From this, one defines the partition function as 

Z(G,J)== exp(-£^). (13) 

ip:V(G)^[q\ 

Another important quantity is the ground state energy 

£{G,J)= min (14) 

Note that both of these quantities are familiar: if we take (3 = — J, then £{G, J) = — rmcut(G, /?), 
and if we take (3ij = exp(Jij), then Z{G, J) — hom(G, exp(/3)). Even restricted multiway cuts 
correspond to a quantity studied in statistical physics: it is called microcanonical ground state 
energy there. 

The above definitions don't work well for dense graphs G: as remarked after (jlOp. the num- 
bers log2 hom(G, ff)/|l/(G)p are essentially independent of the node weights of H, so we loose 
information here. In the mean-field theory, we define the mean field partition function of a 
simple graph G by 

Z(G,J)= J2 e-l^«^)l^-. (15) 

V:V{G)^lq] 

The free energy is defined by 

lnZ(G,g) 
\V{G)\ ■ 



Note that the normalization is different from (jl3p in the exponent and therefore we only divide 
by |U(G)| (as opposed to (fTUll). 

For more about this connection, we refer to [30] . 
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2.4 Homomorphisms densities in the sparse case 

The best analogue for sparse graphs of the homomorphism density t{F, G) is 

'^^^^>- \V{G)\ ' (^^^ 

which we consider for connected graphs F. We can interpret this number as foUows. For 
u £ V{F) and v E V{G), let hom„_>t((F, G) denote the number of homomorphisms tp oi F into 
G with ip{u) — V. Now we fix any node u of F and select a uniform random node v of G. Then 
s{F,G) is the expectation of homt,^„(f, G). We can interpret 

inj(^^ indGF^ 

similarly. 

Remark 2.4 For bounded degree graphs the order of magnitude of hom(i^, G) (where F is fixed 
and V{G) tends to infinity) is |F(G)|^*^^'', where c{F) is the number of connected components 
of F. But since hom(i^, G) is multiplicative over the connected components of F, we don't loose 
any information if we restrict the definition s{F, G) to connected graphs F. 

The sparse homomorphism densities (|17p contain the same information as the distribution of 
neighborhood samples. The proof of this is a bit trickier here than in the dense case. 

From the interpretation of s{F, G) given above, we see that it can be obtained as the ex- 
pectation of the number of hom„^,„s(F, B), where B is a random ball from the neighborhood 
sample distribution pcn with center v and radius r — \V{F). 

To compute the neighborhood sample distributions from the quantities s{F,G), we first 
express the quantities Sinj{F,G) via inclusion-exclusion. By a similar argument, we can express 
the quantities Sind [F, G) . 

Next, we consider graphs F together with maps S : V{F) — > {0, . . . , d}, and we determine 
the numbers 

ind(f,J,G) 
sUF,S,G)- i^^^^i , 

where ind(F, S, G) is the number injections (p : V{F) V{G) which embed F in G as an 
induced subgraph, so that the degree of ip{v) is S{v). This is again done by an inclusion-exclusion 
argument. 

Given a ball B of radius r, the fraction of nodes v € V{G) for which B{v,r) — B is 
ind(i?, (5, G), where the summation extends over all functions S which assigns to each node 

of B at distance < r from the root its degree in B. This proves that homomorphism densities 

and neighborhood sampling are equivalent. 

2.5 Characterizing homomorphism numbers 

Multigraph parameters of the form hom(-, H), where if is a weighted graph, were characterized 
by Freedman, Lovasz and Schrijver [51j . 
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Theorem 2.5 Let f he a graph parameter defined on multigraphs without loops. Then f is equal 
to hom(.,i7) for some weighted graph H on q nodes if and only if it is reflection positive and 
rk(M(/,A:)) < q'' for all k. 

Several improvements and versions of this result have been obtained. It is shown in [SS] that 
it is enough to assume the rank condition for k < 2. Analogous characterizations can be given 
for graph parameters of the form hom(-,iJ) where the nodeweights in H are all 1 [106] . and 
where H is an unweighted graph without multiple edges (but with loops allowed) [81] . There is 
also an analogous (dual) characterization of graph parameters of the form hom(_F', .), defined on 
simple graph with loops, where F is also a simple graph with loops [5T]. These results can be 
extended to directed graphs, hypergraphs, semigroups, and indeed, to all categories satisfying 
reasonable conditions [82] , 

The two conditions on connection matrices in the theorem have interesting uses of their own. 

2.5.1 Reflection positivity and extremal graph theory 

Theorem 16.131 will give a number of equivalent (cryptographic) descriptions of limit objects 
of growing graph sequences, and it can be used to characterize all reflection positive graph 
parameters, see Corollary 16. 141 

Reflection positivity implies a number of very useful relations between the densities of various 
subgraphs in a given graph, which in turn can be used to prove results in extremal graph theory. 
We will illustrate this in Section O 

We'll return to applications of reflection positivity of connection matrices in the context of 
continuous generalizations of graphs (Section [9]) and in extremal graph theory (Section [9|). 

2.5.2 Finite connection rank 

The finiteness of the rank of connection matrices is also interesting. One reason to be interested 
in this question is the fact that such a graph parameter can be evaluated in polynomial time for 
graphs with bounded treewidth [78| . 

There are several examples of graph parameters with finite connection rank 77]: the num- 
ber of perfect matchings, the number of all matchings, the number of Hamiltonian cycles, any 
evaluation of the Tutte polynomial. 

A challenging problem is to determine all graph parameters for which all the connection 
matrices have finite rank. Homomorphism functions hom(., H) are examples for every weighted 
graph H (here the nodeweights and edgeweights can be negative) . Dual homomorphism densities 
hom(_F', .) also have finite connection rank. Every evaluation of the Tutte polynomial is a further 
example. 

Very recently Godlin and Makowski proved that all graph parameters which are evaluations 
of graph polynomials definable in Monadic Second Order Logic have finite connection rank. This 
result can be used mostly as a tool to prove that certain properties are not definable this way. 

Further variants of this problem ask for the characterization of graph parameters with expo- 
nentially bounded connection rank, or polynomially bounded connection rank. 
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2.6 Graph algebras 

A quantum graph is defined as a formal linear combination of a finite number of graphs with 
real coefficients. For every quantum graph x, let N{x) be the maximmn number of nodes in the 
graphs occurring in x with nonzero coefficient. The definition of hom(_F, G) and t{F, G) extends 
to quantum graphs linearly: if / = X]r=i ^i^i ^'^^ 9 ~ X^JLi l^j^j^ then we define 

n ni 

hom(/, ff) = X! X! •^iAijhom(F,;, Gj). 

i=i j=i 

Quantum graphs are useful in expressing various combinatorial situations. For example, for any 
graph F we define 

F= J2 (-1)'^*^'^'^'. (18) 

F':V(F') = V(F) 
E(F' )DE(F) 

Then t{F, G) is just the probability that a random map V{F) — > V{G) preserves adjacency as 
well as non-adjacency. 

Let / be any graph parameter and fix an integer fc > 0. Let Qk denote the (infinite dimen- 
sional) vector space of all fc-labeled quantum graphs. We can turn Qk into an algebra by using 
F1F2 introduced above as the product of two generators, and then extending this multiplication 
to the other elements linearly. Clearly Qk is associative and commutative. The graph Ok on k 
nodes with no edges is the multiplicative unit in Qk- If all nodes of F are labeled, then both F 
and the quantum graph F introduced above (keeping the node labels) are idempotent: F^ = F 
and p2 ^ 

Every graph parameter / can be extended linearly to quantum graphs, and defines an inner 
product on Qk by 

{x,y) -.^ f{xy). (19) 

This means that our graph algebra is a Frobenius algebra (see [TQ]). This inner product has nice 
properties, for example 

{x,yz) ^ {xy,z). (20) 

Let Mk{f) denote the kernel of this inner product, i.e., 

A4(/) := {x e Qk ■■ fixy) = Vy e Qk}. 

Then we can define the factor algebra 

Qk/f Qfc/A4(/). 

Example 2.6 As an example, consider the number pm(G') of perfect matchings in the graph G. 
It is a basic property of this value that subdividing an edge by two nodes does not change it. 
This can be expressed as P4 — P2 G A/'2(pm), where Pk denotes the paths with k nodes, of which 
the two endnodes are labeled. 
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We can introduce a third "product" : the tensor product G (g) of a fc-labeled graph G and 
an ^-labeled graph H is defined as the (fc + Z)-labeled graph obtained as the disjoint union of G 
and H, where the labels in H are increased by fc. If fc = ^ = 0, then the tensor product is the 
same as the product in the algebra Qt- 

The parameter / is reflection positive if and only if the inner product (|19p is positive semidefi- 
nite on Q^.; equivalently, positive definite on Qfe//, so it turns Qk/ f into a Hilbert space. In fact, 
the factor algebra Qk/ f is a finite dimensional commutative ^-algebra, which has both a commu- 
tative and associative product and a positive definite inner product, related by {x, yz) = {xy, z). 

The dimension of Qk/ f is the rank of the connection matrix. If this rank is a finite number 
TO and the parameter is reflection positive, it follows that Qk/ f is isomorphic endowed with 
the coordinate- wise product and the usual inner product. 

There are many algebraically interesting connections between these algebras, for example, 
there is an embedding given by the tensor product 

Qk/f ® Qi/f ^ Qk+i/f, (21) 

which shows that dim(Qfe//) is a superadditive function of k. 

This nice algebraic structure can be exploited in various ways [211 [751 IHll IM] ■ Let us sketch 
the proof of Theorem 12.51 in an (easier) special case: when there is no degeneracy in the sense 
that the embedding in (I2ip is an isomorphism (this is in fact the generic case, which occurs 
whenever / = hom(.,iJ), where H has no "twin" nodes nor any nontrivial automorphism). So 
we have dim{Qk/f) = q'^ for all k. 

Let pi, . . . ,pq be the basis of Qi/ f consisting of idempotents (corresponding to the standard 
basis vectors in R"^). Define p^p = Pi^(i) (8) • • • (8) Pip(k) for all (p : [fc] ^ [q], then the fc-labeled 
quantum graphs p^ form a basis of Qk/f consisting of idempotents. 

We can define a weighted complete graph H on [q] as follows: let a; = f{Pi) and define (3ij 
by expressing the graph k2 (a single edge with both nodes labeled) in the idempotent basis: 

fc2 = ^ fitj{pi®Pj) 

This defines nodeweights ai and edgeweights fjij for H. The nodeweights are positive, since 

a^ - f{Pi) - /(Pf ) > 0. 

The definition of the /3y implies that 

k2{Pi®Pj) = f3tj{pi(E)Pj). (22) 

We claim that the weighted graph H obtained this way satisfies /(G) = hom(G, H) for every 
multigraph G. Indeed, we may assume that V{G) — [k] and all nodes of G are labeled. Then 
we can write 

G = Kuv, 

uv(£E{G) 
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where Kuv consists of k labeled nodes and a single edge connecting u and v. Equation 
implies that 




Using (pn|) repeatedly, we get 



( E p.) 



G = 



n 



uv 



tp: [k]-*[q] uveE(G) 



uv£E(G) 



and so 



E n 




n 



hom(G,iJ). 



[k]^[q] uveE(G) 



3 Graph-like structures on probability spaces 

The aim of this section is to introduce certain analytic objects, which will serve as limit objects 
for graph sequences, separately in the dense and sparse case. It is an interesting feature of these 
structures that they have come up in different studies. 

In the dense case, several versions of these objects turn out to be equivalent; graphons are 
very simple objects (2- variable measurable functions), but they turn out to be equivalent, among 
others, to exchangeable random variables. 

In the bounded degree case, several related, but non-equivalent notions have been proposed, 
at least one of which (graphings) is also known from group theory. 

3.1 Graphons 

Let W denote the space of all bounded symmetric measurable functions W : [0, 1]^ — > M (i.e., 
W{x, y) = W{y, x) for all x,y <E [0, 1]). Let Wq denote the set of all functions W £W such that 
< I^ < 1. 

A function W €W \s called a stepfunction, if there is a partition 5*1 U • • • U 5*^ of [0, 1] into 
measurable sets such that W is constant on every product set Si x Sj. The number k is the 
number of steps of W. 

For every weighted graph G, we define a stepfunction Wq G Wq as follows. Let V{G) = [n]. 
Split [0, 1] into n intervals Ji, . . . , J„ of length X{Ji) — aijaQ. For x (z Ji and yd Jj, let 

VFg(x,2/)=A,(G). 

Let e W and let ip : [0, 1] [0, 1] be a measure preserving map. We can define another 
function by 

W'^ix,y)^W{ip{x),^{y)). 

From the point of view of using these functions as continuous analogues of graphs, the functions 
W and are not essentially different (they are related like two isomorphic graphs in which the 
nodes are labeled differently). One has to be a little careful though, because measure preserving 
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maps are not necessarily invertible, and so the relationship between W and is not symmetric. 
We call two graphons W and W' weakly isomorphic, if there is a third graphon U and measure 
preserving maps ip, ip' : [0, 1] [0, 1] such that W ~ and W = almost everywhere. It 
is not hard to show that weak isomorphism is an equivalence relation. 

Equivalence classes of functions in Wq under weak isomorphism are called graphons. (Some- 
times we call a function W G Wq a graphon; by analogy with graphs, these functions could be 
called "labeled graphons".) 

3.1.1 Homomorphisms into graphons and from graphons 

Counting homomorphism into graphs extends to counting homomorphism into graphons in the 
following sense: For every W G W and simple graph F = {V, E), define 



t{F,W) = 




Then it is easy to verify that for every graph G, 

t{F,G)^t{F,WG). (23) 

Of the two modified versions of homomorphism densities ^ and ([3]), the former has not sig- 
nificance in this context since a random assignment i i~+ (i G V{F),Xi G [0,1] is injective 
with probability 1. But the induced subgraph density is worth defining, and in fact it can be 
expressed as 




We have then 

iind(F,G) =ti„d(i^,l^G), (25) 
and the inclusion-exclusion formula ([5]) follows by expanding the parentheses in the integrand 

(IMl). 

Borgs, Chayes and Lovasz j26j proved that the homomorphism densities determine the 
graphon: 

Theorem 3.1 Two graphons are weakly isomorphic if and only if t{F, W) — t{F, W') for every 
simple graph F. 

A natural idea of the proof of this theorem would be to bring every graphon to a "canonical 
form" , so that weakly isomorphic graphons would have identical canonical forms. In the case of 
functions in a single variable, a canonical form that works in many situations can be obtained 
through "monotonization" : for every bounded real function on [0, 1] there is an unique monotone 
increasing left-continuous function on [0, 1] that has, among others, the same moments. For 
graphons this does not seem to be doable, but the proof of Theorem 13.11 goes by constructing, 
for every graphon W, a "canonical ensemble" : a probability distribution on graphons on the 
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same canonical tr-algebra and weakly isomorphic to W, such that two graphons are isomorphic 
if and only if their ensembles can be coupled so that corresponding graphons are identical. 

Alternate proofs of Theorem 13.11 have been given by Diaconis and Janson [35] using the 
theory of exchangeable random variables, and by BoUobas and Riordan [24] combining Theorem 
16.21 below with measure-theoretic arguments. 

There is probably no good way to define homomorphism numbers from graphons into graphs 
or into other graphons. The parameters related to such homomorphisms that extend naturally to 
graphons are defined by maximization, like the normalized maximum cut, and more generally, 
restricted maximum multiway cuts. Let H he a weighted graph with V{H) = [q] and W, a 
graphon. Then we can define 



where {^i, . . . , Sq} ranges over all partitions of [0, 1] into measurable sets with X{Si) — ai[H). 
This quantity does not exactly extend £{G,H) as defined in ([8]), but the error is small: it was 
proved in [30] that for a fixed weighted graph H, 



3.1.2 M^-random graphs 

A graphon W gives rise to a way of generating random graphs that are more general than the 
Erdos-Renyi graphs. This construction was introduced by Lovasz and Szegedy [551 and BoUobas, 
Janson and Riordan [22J. 

Given a graphon W and an integer n > 0, we can generate a random graph G(n, W) on node 
set [n] as follows: We generate n independent numbers Xi, . . . , X„ from the uniform distribution 
on [0, 1], and then connect nodes i and j with probability W{Xi, Xj), making an independent 
decision for distinct pairs 

As a special case, if W is the identically p function, we get "ordinary" random graphs G(n, p) . 

We can extend this construction to generating a countable random graph G{W) on N: We 
generate an infinite sequence Xi, X2, ... of uniformly distributed random points from [0, 1], and 
(as before) connect nodes i and j with probability W{Xi, Xj). 

Graphons will come up in several ways in our discussions. In Theorem 16.131 we will collect 
the many disguises in which they occur. 

3.2 Graphings 

3.2.1 Measure preserving graphs 

Let G be a graph with node set [0, 1], with all degrees bounded by d. We call G measurable, if 
for every (Lebesgue) measurable set B the neighborhood N{B) in G is also measurable. 

For every set A C [0, 1] and x £ [0, 1], let dA(x) denote the number of neighbors of x in B. 
One can prove using the measurability of G that dA{x) is a measurable function of x. We say 






(26) 
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that G is measure preserving, if it is measurable and for any two measurable sets A, B, 



Assuming that this relation holds, we can define a measure /i on the Borel sets of [0, 1]^ 
by Ijl{A X B) = J^dsix) dx. This measure is concentrated on the set of edges (which can be 
considered as a subset of [0, 1]^). Furthermore, the marginals of /i are absolutely continuous with 
respect to the Lebesgue measure, and their Radon-Nikodym derivative is the degree function. 

In every measure preserving graph G, we can define the density s{F, G) of a graph F . Indeed, 
let us recall that s{F, G) is the expectation of homi,^„(F, G), where w is a fixed node of F and 
w is a random node of G. Since we have a probability distribution on V{G), and hom„_»„(_F, G) 
is a bounded measurable function of w, this definition carries over verbatim. 

Similarly, we can talk about the neighborhood distributions pG.m in a measure preserving 
graph. 

3.2.2 Graphings 

Let Ai, . . . , Ad, Bi, . . . , Bd be measurable subsets of [0, 1], and let ipi : Ai ^ Bi be bijective 
measure preserving maps. The tuple H = ([0, 1], ^pi, . . . , (fa) is called a graphing (see |53[ 169)). 
From every graphing H we get a directed graph G on [0, 1] by connecting x and y in [0, 1] if 
there is an i such that y = (pi{x). The edges of this digraph are colored with d colors in such a 
way that each color-class defines a measure preserving bijection between two subsets of [0, 1]. 

Forgetting the orientation and the edge-colors of this digraph, we get a measure preserving 
graph with degrees bounded by 2d. A measure preserving graph with its edges colored and 
oriented so that each color defines a measure preserving bijection is equivalent to a graphing. 

It would be perhaps more natural to assume that the maps ipi, . . . ,ipd are involutions, in which 
case we get an undirected graph, and we can extend the tpi to measure preserving involutions 
[0, 1] — !■ [0, 1]. It is true that for every graphing there is such an involutive graphing defining the 
same measure preserving graph; but the number of maps may become much larger. 

Every measure preserving graph arises from a graphing: 

Theorem 3.2 Let G be a measure preserving graph with degrees bounded by d. Then there is a 
graphing H — ([0, 1], Lp\, . . . , ^Pr), where r < d^ , such that the underlying graph is G. 

One way of looking at a representation of a measure preserving graph as a graphing is that 
it provides a certificate that the graph is measure preserving. The graphing representing a given 
measure preserving graph may not be unique. 

Theorem 13.21 can be viewed as a measure preserving graph version of Shannon's Theorem, 
which asserts that the edges of a multigraph with maximum degree d can be colored by 'id/2 
colors. (For simple graphs, Vizing's Theorem gives the better bound oi d + 1.) The bound is 
probably not optimal in the measure preserving version either. 

We will talk about s{F, H) if F is a (finite) graph and 7J is a graphing. This will mean simply 
s{F, G), where G is the underlying measure preserving graph. 




(27) 
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We note that both in measure preserving graphs and graphings, we could replace the proba- 
bility space [0, 1] by any other standard probability space, but this would not lead to any gain in 
generality. However, in some cases the presentation of the measure preserving graph or graphing 
is more natural on other probability spaces. 

3.2.3 Random countable rooted graphs 

Measure preserving graphs are also related to certain probability distributions on rooted count- 
able graphs, introduced by Benjamini and Schramm [16] . 

Let G be a measure preserving graph and choose a uniform random point x G [0, 1]. The 
connected component Gx of G containing x is a countable graph with degrees bounded by d, 
and with a "root" node x. 

Let Gd denote the set of connected countable graphs with all degrees bounded by d, rooted 
at a node. Let Ad denote the cr-algebra on Gd generated by subsets obtained by fixing a finite 
neighborhood of the root. The map x i— s- G^ is measurable as a map [0, 1] — > {Gd,Ad), and thus 
every measure preserving graph G defines a probability distribution tt on (Gd,Ad)- 

Condition I 2 71 implies the following property of the measure tt. Selecting a rooted graph G from 
TT and then selecting a uniform random edge from the root, we get a probability distribution tt* 
on the set G'^ of rooted graphs in G^ with an edge (the "root edge") from the root also specified. 
We say that tt is unimodular, if the map G^ G'^ obtained by shifting the root node to the 
other endnode of the root edge is measure preserving with respect to tt. 

The measure on Gd obtained from a measure preserving graph is unimodular. Vice versa, 
every such measure is obtained from a graphing (and hence from a measure preserving graph; 
Elek [37]). 

4 The cut-distance of two graphs 

The definition of the distance of two arbitrary graphs is quite involved, and we will approach 
the problem in steps: starting with two graphs on the same node set, then moving to graphs 
with the same number of nodes (but unrelated), then moving to the general case. 

In this section we consider dense graphs. The definitions are of course valid for all graphs, 
but they give a distance of o(l) between two graphs with edge-density o(l). 

4.1 Two graphs on the same set of nodes 

Let G and G' be two graphs with a common node set [n]. The distance notion discussed here 
was initiated by Frieze and Kannan [52], and elaborated, e.g., in [53]. For an unweighted graph 
G — {V, E) and sets S,T C V, let eG{S, T) denote the number of edges in G with one endnode 
in S and the other in T (the cndnodes may also belong to SClT; so ea{S, S) is twice the number 
of edges spanned by S) . For two graphs G and G' on the same node set [n] , we define their cut 
distance by 

^□(G, G')^^ rnax {eaiS, T) - e^S, T)\. 

S,TCV{G) 
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Note that we are dividing by and not by \S\ x |r|, which would look more natural. However, 
dividing by \S\ x |r| would emphasize small sets too much, and the maximum would be attained 
when \S\ — \T\ — 1. With our definition, the contribution of a pair S,T is at most |r| • jS'l/n^ 
(for simple graphs). 

It is easy to see that d\j{G, G') < di{G, G'), and in general the two sides are quite different. 
For example, if G and G' are two independent random graphs on [n] with edge probability 1/2, 
then with large probability dn(G, G") = 0{l/y/n). 

4.2 Two graphs with the same number of nodes 

If G and G' are unlabeled unweighted graphs on different node sets but of the same cardinality 
n, then we define their distance by 

<5n(G, G') = min da{G, G'), (28) 

G,G' 

where G and G' range over all labelings of G and G' by 1, . . . , n, respectively. (The hat above 
the 5 indicates that the "ultimate" definition will be somewhat different.) 

4.3 Two arbitrary graphs 

Let G = {V,E) and G' = {¥',£') be two graphs with (say) V = [n] and V = [n']. To define 
their distance, we need a graph operation: for every graph G and positive integer m, let G(m) 
denote the graph obtained from G by replacing each node of G by m nodes, where two new 
nodes are connected if and only if their predecessors were. 
We can use the distance (5n to define the distance 

(5n(G,G')= lim 5u{G[kn'],G'[kn]). 

k — >oo 

(Here G{kn') and G'{kn) have the same number of nodes.) 

A more complicated but "finite" definition of the same quantity can be given as follows. A 
fractional overlay of G and G' is a nonnegative n x n' matrix X such that X]"=i -^iu — and 
^"^■^ Xiu — H n = n' and a : V ^ V is a, bijection, then Xi„ = is a fractional 

overlay (which in this case is an honest-to-good overlay). We denote by A'(G, G') the set of all 
fractional overlays. 

For a matrix M, let I](M) denote the sum of its entries. Then the distance of the two graphs 
can be described by the following optimization problem: 

(5n(G,G')= min max I V Xi^X.^ - V . (29) 

ijeE u„eE' 

To illuminate this definition a little, we can think of a fractional overlay as a coupling of 
the uniform distribution on V{G) with the uniform distribution on V{G'): it gives a probability 
distribution x on V{G) x V{G') whose marginals are uniform. Select two pairs {i,u) and (j,v) 
from the distribution x- Then the first sum in (I29p is the probability that "m e Y and jv G Z 
and ij € i?" , and the second sum is the probability that "m e Y and jv G Z and uv is an edge" . 
Thus (j29p expresses some form of correlation between ij being an edge and uv being an edge. 
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One word of warning: (5q is only a pseudometric, not a true metric, because ^□(G, G") may 
be zero for different graphs G and G' . This is the case e.g. if G' = G{k) for some k. 

Definition can be extended to weighted graphs, but instead of going through the hairy 
formulas, we postpone this to the next section. 

We conclude with a problem for which only partial results are available. If G and G' have 
the same number of nodes, then the definition of (5q does not give back 5^. It was proved in [29] 
that 

5u{G, G') < Sa{G, G') < 32So{G, G)^/^^. (30) 

This is a rather weak result, its significance being that and 5^ define the same Cauchy 
sequences. Alon (unpublished) proved that 

5a{G,G')<{l + o{l))5a{G,G) (31) 

if \V{G)\ = \ V{G')\ oo. We conjecture: 

Conjecture 4.1 For any two graphs G and G' on n nodes, S\j{G, G') < 26\j{G, G'). 
An analogous result for the edit distance was proved by Pikhurko [TO] . 

4.4 Distance of graphons 

This notion of distance extends to graphons as follows (and it is perhaps more natural in that 
context). We consider on W the cut norm 



\\W\\u= sup / W{x,y)dxdy 



S',TC[0,1] -ISxT 

where the supremum is taken over all measurable subsets S and T . It is sometimes convenient 
the use the corresponding metric di{U,W) = \\U — W}u- We define the cut distance 

5a{U,W) = inidu{U,W^), 

where f ranges over all invertible measure preserving maps from [0, 1] — > [0, 1], and W'^{x,y) = 
WMx),^{y)). 

The distance 6[j of graphons is only a pseudometric, since different graphons can have distance 
zero. This happens precisely when they are weakly isomorphic. 
If G and G' are weighted graphs, then we have 

SaiG,G')^SaiWG,WG'). (32) 

This could serve as a more natural (but not combinatorial) definition of the distance of two 
graphs, and we will use it to define the distance of two weighted graphs. Let K denote the graph 
with a single node of weight 1, endowed with a loop with weight 1/2. Then for a random graph 
Gr = G{n, 1/2), we have (5n(G, K) = 0{l/y/n) with large probability. 

Going into all the complications with using the cut norm and then minimizing over measure 
preserving transformations is justified by the following important fact. 
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Theorem 4.2 The pseudometric space (Wojfa) compact. 

The proof depends on Szemeredi partitions, to be discussed in section O 
Convergence in the ||.||n norm is stronger than weak-*-convergence. To be more precise, if 
\\Wn — W\\\z\ (n^oo), then it foUows immediately from the definition that 

/ Wn^ f W, (33) 

JSxT JSxT 

and hence by standard arguments we get that 

/ U-Wn~^ [ U-W (34) 

for every integrable function U. However, weak-* -convergence is not equivalent of convergence 
in the norm; a counterexample can be obtained e.g. from Example 16.191 fsee [3T]). 
Similar construction can be applied to other norms, e.g., from the Li-norm 

/ \W{x,y)\dx 

we get 

di{U,W) = \\U -W^Wi and 5i{U,W) M Si{U,W). 

V 

5 Szemeredi partitions 

One of the most important tools in understanding large dense graphs is the Regularity Lemma 
of Szemeredi |1121 I113| and its extensions. This lemma has many interesting connections to 
other areas of mathematics, including analysis [87l[23] and information theory [114] . It also has 
weaker (but more effective) and stronger versions. Here we survey as much as we need from this 
rich theory. 

5.1 e-regular bipartite graphs and the original lemma 

For a graph G = {V,E) and for X,Y C V, let ec{X,Y) denote the number of edges with one 
endnode in X and another in Y; edges with both cndnodes m X OY are counted twice. We 
denote by daiX, Y) = "^^^^^y^^ the density of edges between X and Y. If X and Y are disjoint, 
we denote by G[X, Y] the bipartite graph on X (JY obtained by keeping just those edges of G 
that connect X and Y. 

Let V = {Vi, . . . ,Vk} be a partition of V. We say that V is an equipartition if < 
l^il < n^il/^l foi' all 1 < i < fc. We define the weighted graph G-p on V by taking the complete 
graph and weighting its edge uv by dciVi, Vj) if u Cz Vi and v Cz Vj. 

The Regularity Lemma says, roughly speaking, that every graph has a partition V into a 
"small" number of classes such that G-p is "close" to G. There are (non-equivalent) forms of 
this lemma, depending on how we measure the error. 
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Let G be a bipartite graph G with bipartition {t/, W}. On the average, we expect that for 
X C J7 and r C VF, 



eG{X,Y)^dG{X,Y)\X\-\Y\. 

For two arbitrary subsets of the nodes, eQ{X, Y) may be very far from this "expected value" , but 
if G is a random graph, then, however, it wiU be close; random graphs are very "homogeneous" 
in this respect. We say that G is e-regular, if 



eaiX^Y) _^ 



\X\-\Y\ 



< e (35) 



holds for all subsets X QU andY QW such that \X\ > s\U\ and |y| > e\W\. 

Notice that we could not require condition ([55)1 to hold for small X and Y: for example, 
if both have one element, then the quotient eG{X,Y)/ {\X\ ■ \Y\) is either or 1. However, we 
could replace it by the condition 

\eG{X,Y)^d\X\-\Y\\<e\U\-\W\ (36) 

for aU r C L/ and Y CW. Indeed, ^ implies §^ for \X\ > e\U\ and |y| > e\W\, while if 
e.g. \X\ < e\Ul then eG{X,Y) < s\U\ ■ \W\ and d\X\ ■ \Y\\ < e\U\ ■ so ^ holds trivially. 
Conversely, if (|36|) holds with e replaced by e^, then 

£'\U\-\W\ 
- \X\-\Y\ 



eG{X,Y) _^ 



\X\-\Y\ 

if \X\ > s\U\ and |y| > e\W\. 

With these definitions, the Regularity Lemma can be stated as follows: 

Lemma 5.1 (Szemeredi Regularity Lemma, usual form) For every e > there is a k = 
k{e) such that every graph G = {V^E) on at least k nodes has an equipartition {Vi, . . . ,Vk} 
< /c < fc(e)) such that for all but efc^ pairs of indices 1 < i < j < k, the bipartite graph 
G[Vi,Vj] is e -regular. 

One feature of the Regularity Lemma, which unfortunately forbids practical applications, is 
that fc(e) is very large: the best proof gives a tower of height about and unfortunately this 

is not far from the truth, as was shown by Gowers [60j . 

5.2 Weak Regularity Lemma and distance of graphs 

A version with a weaker conclusion but with a more reasonable error bound was proved by Frieze 
and Kannan [5^ . 

Lemma 5.2 (Weak Regularity Lemma) For every k > 1 and every graph G ~ {V,E), V 
has a partition V into k classes such that 

da{G,Gr) < -=i=. 

Vlogfc 
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Note that we do not require here that V be an equipartition; it is not hard to see that this 
version impHes that there is also an equipartition with similar property, just we have to increase 



the error bound to 4/-\/log k. 

To see the connection with the original lemma, we note that if G is an e-regular bipartite 
graph say in the sense of p6p . and H is the weighted complete bipartite graph with the same 
bipartition {J7, M^} and with edge weights d, then ((36|) says that dij{G,H) < e. Hence if 7^ is a 
Szemeredi partition in the sense of Lemma l5.ll then the distance between the bipartite subgraph 
of G induced by Vi and Vj , and the corresponding weighted bipartite subgraph of G-p , is at most 
e for all but efc^ pairs and at most 1 for the remaining ek'^ pairs. It is easy to see that 

this implies that the distance between G and G-p is at most e. So the partition in Lemma [5T2] 
has indeed weaker properties than the partition in Lemma 1 5. II Of course, this is compensated 
for by the relatively decent number of partition classes. 

If we keep in G-p an edge with weight p with probability p and delete it with probability 1 — p, 
then we get a random graph H, and it is easy to see that with large probability d^{G-p, H) < 
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= . This implies the following version of the Weak Regularity Lemma: 



\/\vW\' 

Lemma 5.3 For every k > 1 and graph G, there is a graph H with k nodes such that 
Sn{G,H)< 



Vlogfc 

5.3 Strong Regularity Lemma and compactness 

Other versions of the Regularity Lemma strengthen, rather than weaken, the conclusion (of 
course, at the cost of replacing the tower function by an even more formidable value). Such a 
"super-strong" Regularity Lemma was proved by Alon, Fisher, Krivclcvich and Szegedy [5]. We 
state the following equivalent version from [87j . 

Lemma 5.4 (Strong Regularity Lemma) For every sequence (eo,ei, ...) of positive numbers 
there is a positive integer ko such that for every graph G — {V,E), there is a graph G' on V, 
and V has a partition V into k < kg classes such that 

di(G,G')<£o and dn{G\G'.p) < Ek- (37) 

Note that the first inequality involves the normalized edit distance, and so it is stronger than 
a similar condition with the cut distance would be. The second error bound Ek in (j37p can be 
thought of very small. If we choose Ek = Eq, we get the Frieze-Kannan version [5T2l (with e = 2eo). 
Choosing Ek = Eg/k'^, the partition obtained satisfies the requirements of the original Regularity 
Lemma 15.11 

The strong version itself follows rather easily from the compactness of the space (Wo,(5n) 
(Theorem |4?2|) : see |85] for details. 
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5.4 Partitions into sets with small diameter 

5.4.1 Small diameter sets and regularity 

We can equip every graph G = {V, E) with a metric as follows. Let A be the adjacency matrix of 
G. We define the similarity distance of two nodes i,j G y as the distance of the corresponding 
rows of (squaring the matrix seems unnatural, but it is crucial; it turns out to get rid of 
random fluctuations) . The following was proved (in somewhat different form) in |87| . 

Theorem 5.5 Let G be a graph and let V = {Vi, . . . , V^} he a partition of V . 

(a) Ifd\j{G, G-p) = e, then there is a set S 'ZV with \S\ < 8^\V\ such that for each partition 
class, Vi \ S has diameter at most in the ^2 metric. 

(b) // there is a set S ^ V with \S\ < 5\V\ such that for each partition class, Vi \ S has 
diameter at most 5 in the c?2 metric, then d\^{G,Gv) < 24(5. 

Theorem 15.51 suggests to define the dimension of a family Q of graphs as the infimum of real 
numbers d > for which the following holds: for every e > and G E G the node set of G can 
be partitioned into a set of at most £1^(0)1 nodes and into at most e"'' sets of diameter at most 
e. (This number can be infinite.) In the cases when the graphs have a natural dimensionality, 
this dimension tends to give the right value. For example, let G be obtained by selecting n 
random points on the d-dimensional unit sphere, and connecting two of these points x and y 
with a probability W{x,y), which is a continuous function of x and y. With probability 1, this 
sequence has dimension Q{d). 

5.4.2 Computational applications 

As an easy application of Theorem 15. 5[ we give an algorithm to compute a weak Szemeredi 
partition in a huge graph. Our goal is to illustrate how an algorithm works in the pure sampling 
model, as well as in what form the result can be returned. This way of presenting the output of 
an algorithm for a large graph was proposed by Frieze and Kannan |52j . 

We start with an auxiliary algorithm that computes (approximately) the d2 distance of two 
nodes. 

Algorithm 5.6 

Input: A graph G given by an sampling oracle, two nodes u,v € V, and an error bound 
e>0. 

Output: A number D2{u,v) > such that with probability at least 1 — e, 

D2{u, v) - e < d2{u, v) < D2{u, v) + e. 

To see how this can be done, we rewrite the definition of the ^2 distance as follows. For 
x,y & V{G), let a{x, y) be the corresponding entry of the adjacency matrix of G: this is 1 if they 
are adjacent and otherwise. Define 

a2{x,y) = £za(x,z)a{y,z), 
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where z is a uniform random node in V; this is the corresponding entry of the square of the 
adjacency matrix, normahzed by |V^(G)|. Finally, let 

d2{x, y) = E^{\a2{x, z) - a2{y, z)\), 

where again z is a uniform random node in V. Drawing a sufficiently large sample (depending 
on e) , these expectations can be approximated by averaging. 

Algorithm 15.61 enables us to encode a partition of V{G) as a subset R C V{G): for each 
r e i?, we define the partition class Vr as the set of nodes u £V such that the node in R closest 
to u is r. Ties will be broken arbitrarily, and nodes to which there are several "almost closest" 
nodes may be misclassified, but this is the best one can hope for. To formalize, 

Algorithm 5.7 

Input: A graph G given by an sampling oracle, a subset R C V{G), a node u G V, and an 
error bound e > 0. 

Output: An r E R such that with probability at least 1 — £, d2{u,r) < (1 + e)d2{u,r). 

The way this second algorithm works is that it uses Algorithm IS . 6l to compute (approximately) 
the distances d2{u, r), r G R, and returns the node r G R that it finds closest to u. Borrowing a 
phrase from geometry, we compute the Voronoi cells of the set R. 

Using this encoding of the partition, the following algorithm computes a weak Szemeredi 
partition. 

Algorithm 5.8 

Input: A graph G given by an sampling oracle, and an error bound e. 

Output: A set R C V{G) with \R\ < 2^/^^ such that, with probability at least 1 — e, 
d2{u, R) < e for all but an e fraction of the nodes u. 

The set R is grown step by step, starting with the empty set. At each step, a new uniform 
random node w of G is generated, and the approximate distances D2{u, v) are computed for all 
r E R with error less than £/|i?|. If all of these are larger than e/2, w is added to R. Else, 
w is thrown out and a new random node is generated. If R is not increased in steps, the 
algorithm halts. 

It is clear that if more than an s fraction of the nodes are farther than e from R, then in 
iterations we are very likely to sample one of these, and then with large probability we get 
the distances right and so we increase R. 

Theorem l5.5l savs in this context that the partition determined by Algorithms l5.6If5?8l satisfies 
du{G,Gv) < (4e)i/'* with large probability. 

We conclude with an answer to Question 4 in Section 11.21 For the partition V implicitly 
determined above, we can also compute the edge densities between the partition classes, which 
we use to weight the edges of the complete graph on R, so that we get a weighted graph H. We 
find the maximum cut in H by brute force, to get a partition i? = i?i Ui?2. This gives an implicit 
definition of a cut in G, where a node u if put on the left side of the cut iff D2{u, Ri) < D2{u, R2) 
for the approximate distances computed by Algorithm 15.61 
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5.5 Regularity Lemmas for bounded degree graphs? 

The Regularity Lemma as discussed above does not say anything for non-dense graphs. Several 
extensions for this case are known [711 154] , but they are meaningless for graphs that are very 
sparse, in particular if they have bounded degree. 

Is there a Regularity Lemma for graphs with bounded degree? There are great difficulties 
here, but three results justify cautious optimism. 

An observation of Alon (unpublished) implies that a weak analogue of the Regularity Lemma, 
version [5. 31 holds. Using the sampling distance introduced in Section [TT3T] we can state this as 
follows: 

Proposition 5.9 For every d> I and e > there is a n — n{d,e) such that for every graph G 
with degrees bounded by d there is a graph H with degrees bounded by d and \V{H)\ < n, such 
that dsampic(G, H) <e. 

Unfortunately, no effective bound on n follows from the proof. It would be very interesting 
to give any explicit bomid on the function n{d,e), or to give an algorithm to construct H from 
G. Ideally, one would like to design an algorithm that would work in the sampling framework, 
similarly as the algorithm in Section fS. 4. 21 works in the dense case. 

It was proved recently by Elek and Lippner [41] . and independently by Angel and Szegedy 
[TT] that every graph with degrees bounded by d can be decomposed by deleting en edges into 
"highly homogeneous" parts, where the number of these parts is bounded by a function of d and 
e. Unfortunately, the highly homogeneous parts can still have a complex structure, but this may 
be a first important step in the direction of finding an analogue of the Regularity Lemma. 

A third idea of decomposition is related to F0lner sequences in the theory of amenable groups, 
and is called hyperfiniteness for general graph sequences [40l I102j . A family Q of graphs with 
bounded degree is called hyperfinite, if for every e > there is a fc^ > 1 such that from every 
graph G Q we can delete e|y(G)| edges so that every connected component of the remaining 
graph has at most nodes. Schramm [102] showed that for a convergent graph sequence, 
hyperfiniteness is reflected by the limit object. 

A special case of a hyperfinite family is a family Q of graphs with subexponential growth, 
familiar from group theory. This property is defined by requiring that there is a function / : N — > 
N such that (In /(m))/m — > (m ^ oo) , and for any graph G £ Q, any v e V{G) and any m e N, 
the number of nodes in the m-neighborhood of v is at most f{m). 

It is likely that large real-life networks can be thought of as hyperfinite; on the other hand, 
hyperfinite families of graphs seem to be much better behaved, and some of the theory of dense 
graph sequences can be extended at least to this case. 

6 Convergence and limits I: the dense case 
6.1 Subgraph sampHng 

Recall that we can define a notion of convergence if we fix a sampling method. For dense graphs, 
we use subgraph sampling: We select uniformly a random fc-element subset of V{G), and return 
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the subgraph G[k] induced by them. The probabihty that we see a given graph F is the quantity 
tiadiF,G) introduced in ([3]). A sequence of graphs (G„) with |y(G„)| ^ oo is convergent if the 
induced subgraph densities tind{F,Gn) converge for aU finite graphs F. 

We use this samphng method for dense graphs (otherwise all these densities tend to 0). 

Instead of the induced subgraph densities UndiF, Gn), we could use the subgraph densities 
tinj{F,Gn) or the homomorphism densities t{F,Gn). Indeed, the subgraph densities can be 
expressed as linear combinations of induced subgraph densities and vice versa, while the difference 
t{F,G) ~tir,j{F,G) = 0{1/\V{G)\), and so it tends to if |F(G)| ^ c». 

We can extend this sampling procedure to graphons, and we get to the construction of W- 
random graphs. 

6.2 Convergence in distance 

The definition of convergence can be reformulated using the notion of sampling distance [T] 
a sequence (G„) of simple graphs with |V^(G„)| oo is convergent if for every graph F, 
{tind{F,Gn) ■ i = 1, 2, . . . ) is a Cauchy sequence (equivalently, {t{F,Gn) ■ i = 1,2,...) is 
a Cauchy sequence). This is equivalent to saying that the graph sequence is Cauchy in the 
dsampic metric. The following theorem, which is one of the main results in this theory, justifies 
the use of the cut metric Sq. 

Theorem 6.1 A sequence (G„) of simple graphs (\V{Gn)\ ^ oo) is convergent if and only if it 
is a Cauchy sequence in the metric Stj . 

A quantitative form of this equivalence is given by the following theorem. Part (a) is a 
generalization of what is called the "Counting Lemma" in the theory of Szemeredi partitions; 
part (b) may be called the "Anti-counting" lemma. 

Theorem 6.2 LetU,W eWo. 

(a) For every simple finite graph F , 

\t{F,U)^t{F,W)\ < \EiF)\ ■ daiU.W). 

(b) Let k be a positive integer, and assume that for every simple graph F on k nodes, we 
have 

\t{F,U)-t{F,W)\ <2-^\ 

Then 
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^/Togfc 

The proof of part (a) is quite simple; part (b) depends on the sampling lemmas to be discussed 
in Section WM 

Theorem 16.11 can be generalized to characterize convergence in the space W: 
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Theorem 6.3 Let (Wn) be a sequence of graphons in Wo and let W E Wo- Then t{F,Wn) 
converges for all finite simple graphs F if and only if Wn is a Cauchy sequence in the 6\j metric. 
Furthermore t{F, Wn) —* t{F, W) for all finite simple graphs F if and only if d[j(Wm W) 0. 

6.3 Convergence from the right 

Convergence of a graph sequence can also be characterized in terms of mappings "to the right" . 
Several characterizations along these lines were given in |30j : here we state one: 

Theorem 6.4 Let {Gn) be a sequence of simple graphs such that \V{Gn)\ oo as n oo. 
Then the sequence (Gn) is left-convergent if and only if the sequence £{Gn, F[) is convergent for 
every weighted graph H . 

6.4 Samphng and distance 

The proof of the results in the previous section depends on a couple of probabilistic lemmas, 
which relate sampling to cut distance. The first of these lemmas is due to Alon, Fernandez de la 
Vega, Kannan and Karpinski [6], with an improvement from [29]. Its proof is quite involved. Its 
main implication is that the do-distance of two graphs G and H on the same set of nodes can 
be estimated by sampling. It should be noted that the bound given is quite sharp. 

Lemma 6.5 Let k be a positive integer and let G and H be graphs with V{G) = V{H), \V{G)\ > 
k and edge weights in [0, 1]. Let S be chosen uniformly from all subsets ofV{G) of size k. Then 
with probability at least 1 — le^^l^ . 
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du{G\SlH\S\)-du(G,H) 



< 



The second lemma about sampling shows that a sample is close to the original graph 
with large probability. Note that here we have to sue the Sa distance (since no overlaying is 
given a priori), and also that the bound on the distance is much weaker than in the previous 
lemma. 

Lemma 6.6 Let k > 1, and let G be a simple graph on at least k nodes. If S is a random subset 
of V{G) of size k, then with probability at least 1 — 2"^ , 

10 



fa(G,G[5]) < 



Vlogfc 

This lemma follows from Lemma 15751 and the Weak Regularity Lemma [521 Let us sketch this 
proof. 

Proof. Fix some m > 1. By Lemma [5.21 there is an equipartition V — {Vi, . . . , Kn} of V{G) 
into m classes such that 

da{G,Gv) < -j^. 

Now let S* be a random fc-subset. By Lemma 16.51 we have 

\du(G\S\Gv\S\)-du{G,Gv)\ < ^ 
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with large probability. If k is sufficiently large relative to m, then every class Vi will contain 
about k/m nodes from S. Indeed, a simple application of Chebyshev's Inequality gives that with 
probability at least 3/4, 



% c^S\-- 

m 



< 2Vk 



holds for all i. 

Now blow up each node of G7:>[S'] into n/k twins to get a weighted graph G' (in notation: 
G" = Gp[S'](f )). Then each set V^, n is blown up into a set V/ of size f n S\ ^ \V,\ = ^. In 
fact, 



\\v:\-m\<2Vkm-^-^ 

i—l ^ 



It follows that we can overlay G' and G-p so that corresponding edges have the same weight 
except for edges inside the classes Vi and edges incident with at most ~ 1^1 1 nodes. 

This is only a fraction of ^ + ^ of all edges, which shows that 

1 4m 

Hence 

fa (G, G[5] ) < fa (G, Gp ) + fa (Gp , Gv {S\ ) + fa (Gp \S\ ,G\S\) 
^4 10/1 4m\ 

~ Vlogm k^l"^ \m ^1' 

Choosing m = /c^/*, we get 

r ^ron ^8 10/1 4 \ 10 

fa(G,G[5]) <-= + -—+ -^ + —- < 



A:i/4 V/ci/4 fci/4y ^/i^ 
if fc is large enough. □ 



Both lemmas 16.51 and 16.61 extend to graphons. We only formulate the second one, which can 
be stated in terms of the VF-random graphs G(fc, W\ 

Lemma 6.7 Let fc > 1, and lei W be a graphon. Then with probability at least 1 — 2^^ , 
fa(G(fc,M^),iy) < 



VTogfc' 



To illustrate how these lemmas fit in the proofs, let us first sketch how Lemma [6.71 implies 
the "anti-counting lemma" (Theorem 16 . 2r b') ) . Assume that U,W £ Wo satisfy 



\t{F,U)~t{F,W)\<2 

for every graph F with k nodes. In terms of the Il^-random graphs G(fc, U) and (G(fc, W), this 
implies (by inclusion-exclusion) that 

|Pr(G(fc, U)^F)- Pr(G(fc, W) ^ F)\ < 2(2)2-'=', 
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and hence 

^|Pr((G(fc,C/) ^ F)-Pr{G{k,W)^F)\ < 2'=(fe-i)2-'=' = 

F 

This means that we can couple G(fc, L'^) and G{k, W) so that G{k, U) = G{k, W) with probabihty 
at least 1 — 2~'^. Lemma W7l\ implies that with probability at least 1 — 2^'^, we have 

Sa{U,G{k,U)) < ^2=, 
Vlogfc 

and similar assertion holds for W . Whenever all three happen, we get 

20 

Sa{U, W) < 5a{U, G{k, U)) + fa((G(fc, U), G{k, W)) + 5n{W, G(fc, W)) < 



6.5 Dense limit 

The main motivation behind considering graphons is the following theorem [85j : 

Theorem 6.8 For any convergent sequence (G„) of simple graphs there exists a graphon W 
such that t{F, G„) t{F, W) for every simple graph F . 

We say that this graphon W is the limit of the graph sequence, and write Gn W . 

One might wonder if we really need complicated objects like integrable functions to describe 
these limits; would perhaps piecewise linear, or monotone, or continuous functions suffice? The 
following two results tell us that (up to weak isomorphism) all measurable functions are needed: 
every graphon W can be obtained as the limit of a sequence of simple graphs , and the limit 
is essentially unique [55] . 

Theorem 6.9 For any W £ Wq, the graph sequence G{n,W) converges to the graphon W with 
probability 1. 

On the other hand. Theorem 13 . II implies : 

Theorem 6.10 ([26]) The limit graphon of a convergent graph sequence is uniquely determined 
up to weak isomorphism. 

There are two quite different proofs of the (main) theorem 16.81 The original one in [85] uses 
Szemeredi partitions and the Martingale Convergence Theorem; a more recent proof by Elek 
and Szegedy |42| first constructs a different limit object in the form of an uncountable graph by 
taking the ultraproduct, and them obtains the graphon as an appropriate projection of this (in 
terms of non-standard analysis, the graphon is a non-standard Szemeredi partition of this graph 
on a non-standard [0, 1] interval). 

The first proof has the obvious advantage of being a constructive; but the second proof is very 
general, it extends to hypergraphs and many other structures, and leads to new understanding 
of the Regularity Lemma for hypergraphs [501 (SIl 1100] and its consequences [115) . 

Convergence to the limit object can also be characterized by the distance function introduced 
above [23: 
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Theorem 6.11 For a sequence {Gn) of graphs with |y(G„)| — > oo and graphon W , we have 
G„ -^W if and only if 5a{WG^,W) 0. 

Note that the function Wg„ depends on the labehng of the nodes of G„ (the distance 
5u{Wg„iW) does not, since relabehng G„ results in weak isomorphism of W^g„). Choosing 
the labehng appropriately, we can say more: 

Theorem 6.12 For a sequence {Gn) of graphs with \V{Gn)\ oo and graphon W , we have 
Gn ~* W if and only if the graphs Gn can be labeled so that \\Wq^ — W\\\j 0. 

6.5.1 Equivalent descriptions of the limit 

A random graph model is a probability distribution on simple graphs on [n], for every n > 1, 
which is invariant under the reordering of the nodes. In other words, it is a sequence of random 
variables G„, whose values are simple graphs on [n], and isomorphic graphs have the same 
probability. We say that a random graph model is consistent if deleting node n from G„, the 
distribution of the resulting graph is the same as the distribution of G„-i. We say that the 
model is local, if for every 1 < k < n, the subgraphs of G„ induced by [k] and {fc + 1, . . . , n} are 
independent as random variables. 

It is easy to see that for every graphon W G Wq, 'G{n, W) is a consistent and local random 
graph model. 

A related notion is the following. Let Q be the set of graphs on N; we can think of Q as the 
product space {0, 1}^, where E = (2) is the set of all (unordered) pairs of elements of N. This 
also equips G with a cr-algebra. Let S be the group of permutations of N, and let S2 be the 
action of E on E. Recall that a probability measure tt on ^ is called ergodic with respect to E2 
if it is invariant under E2 and Q has no measurable subset G' with < 7r(5') < 1 invariant under 

It is easy to see for that every W ^ W, the random graph <G{W) defines a probability measure 
on Q invariant under S2- B. Szegedy [lllj showed that this measure is also ergodic. 

After this preparation, we can formulate the theorem describing the many notions equivalent 
to graphons. 

Theorem 6.13 The following structures are cryptomorphic: 

(a) a graphon W G Wq, up to weak isomorphism; 

(b) A graph parameter f that is the limit of graph parameters t(.,Gn) for some convergent 
graph sequence (Gn)- 

(c) A multiplicative, reflection positive graph parameter f satisfying f{Ki) = 1, 

(d) a consistent local random graph model; 

(e) an ergodic measure on Q invariant under S2. 

The equivalences of these structures are mostly contained in results mentioned previously. 
Let us sketch these constructions. 
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(a) -^(b): Every graphon W S Wo gives rise to the graph parameter t{.,W); furthermore, 
W is the hmit of a convergent graph sequence (Gn) (for example, of the sequence of H^-random 
graphs), and for this sequence t{F, Gn) —> t{F, W) for aU F. 

(b) -^(c): If a graph parameter is the hmit of graph parameters t{.,Gn), which satisfy the 
conditions in (c), then clearly so does their limit. 

(c) -^(d): In the special case when / = t{.,G) is the probability that a random map from F 
to some graph G is a homomorphism, we can express the probability that a sample of n points 
gives a given graph Fq, by inclusion-exclusion in terms of the numbers f{F). We can apply 
the same formula to any graph parameter / satisfying (c), and get a probability distribution on 
n-point graphs (here the conditions in (c) are used), which is a consistent local random graph 
model. 

(d) — >(a): Generating a random graph (G„ from the consistent local random graph model, 
it can be shown that we get a convergent graph sequence with probability 1, which tends to a 
graphon W . For this graphon, G(n, W) gives back the random graph model we started with. 

(d)<->(e): It is easy to see that a consistent random graph model is equivalent to a probability 
distribution on Q invariant under S2. The proof that locality is equivalent to ergodicity [lllj is 
trickier and not given here. 

Corollary 6.14 A graph parameter f is reflection positive if and only if it is either identically 
0, or there is a probability distribution p on the Borel sets of (Wo, fa) such that ifW denotes a 
random function from this distribution, then 

f{F) = Et{F,W). 

6.5.2 Examples 

We start with two easy examples. 

Example 6.15 Complete bipartite graphs. It is natural to guess, and easy to prove, that com- 
plete bipartite graphs Kn^n converge to the function defined by W{x,y) = lif0<a;<l/2< 
?;<lor0<y<l/2<a;<l, and W{x, y) — otherwise. 

Example 6.16 Threshold graphs. These graphs are defined on the set {1, . . . , n} by connecting i 
and j if and only if i+j < n. These graphs converge to the function defined by W{x, y) — la;+y<i. 

Example 6.17 A sequence of graphs tending to the identically-p function is exactly what we 
called a quasirandom sequence with density p. 

Two examples of randomly growing graph sequences: 

Example 6.18 Randomly grown uniform attachment graph. We start with a single node. At 
the n-th iteration, a new node is born, and then every pair of nonadjacent nodes is connected 
with probability 1/n. We call this graph sequence a randomly grown uniform attachment graph 
sequence. 
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Let us do some simple calculations. After n steps, let {0, 1, . . . , n — 1} be the nodes (born in 



events are independent for all pairs From here, one can easily figure out that the expected 

number of edges is (n^ — l)/6. 

To describe the limit function, note that the probability that nodes i and j are not connected 
is max(i,j)/n. If i = xn and j — yn, then this is max(a;,?/). Using that these events are 
independent, we can prove that the graph sequence tends to the limit Junction 1 — max(a;, y) 
with probability 1. 

Example 6.19 Randomly grown prefix attachment graph. In this construction, it will be more 
convenient to label the nodes starting with 1. At the n-th iteration, a new node n is born, a 
node z < n \s selected at random, and the new node is connected to nodes 1, . . . , z — 1. We 
denote the n-th graph in the sequence by G^^, and call this graph sequence a randomly grown 
prefix attachment graph sequence. 

The expected number of edges is n[n — l)/4, and one can compute subgraph densities with 
some effort to see that the sequence is convergent with probability 1. It is more difficult to figure 
out the limit graphon. 

We can try to proceed similarly as in the case of uniform attachment graphs. The probability 
that i and j are connected is \ j — max(i, j); if i = xn and j = yn, then this is \x — y\/ max(a;, y). 
Does this mean that the function U{x,y) = |a; — y|/max(x,?/) is the limit? Surprisingly, the 
answer is negative, which we can see by computing triangle densities. 

The key to describe the limit is the remark at the end of Section [1.5.31 namely that instead of 
the uniform distribution over the interval [0, 1], we can use other probability spaces. Let us label 
a node born in step k, connected to {1, . . . ,m}, by the pair {k/n,m/k) £ [0, 1] x [0, 1]. Then 
we can observe that nodes with label {xi,yi) and (x2,j/2) are connected if and only if either 
xi < X2y2 or X2 < xiyi. 

From this observation one can prove that the prefix attachment graphs Gf^^^ converge, with 
probability 1, to the function W : [0, 1]^ x [0, 1]^ — *■ [0, 1], given by 



This gives a nice and simple representation of the limit object with the underlying probability 
space [0, 1]^ (with the uniform measure). If we want a representation on [0, 1], we can map [0, 1] 
into [0, 1]^ by any measure preserving map ip; then W^f^{x, y) — WP^^{(p{x), ip{y)) gives a weakly 
isomorphic graphon. This function is — 1 valued, but its support is fractal-like. 

It is interesting to note that the graphs G(n, W) form a different growing sequence of random 
graphs tending to the same limit W with probability 1. 

6.6 Convergence from the right 

Paper |3p contains several conditions that characterize convergent dense graph sequences in 
terms of homomorphisms "to the right" (we have seen that these correspond to parameters with 
meaning in statistical physics). We only state one of these, in our terms: 



this order). The probability that nodes i < j are not connected is -j^ ■ |^ • • • = ^. These 



Wpfx((xi, J/i), {X2,y2)) 



( 



1, if xi < X2y2 or X2 < xiyi, 
0, otherwise. 
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Theorem 6.20 Let (G„) be a sequence of graphs such that \V{Gn)\ — >■ oo as n oo. Then 
the sequence (Gn) is convergent if and only if the restricted multicut densities rmcut(G'„, H) are 
convergent for every weighted graph H. 

By the value rmcut in this theorem could be replaced by horn* {G,H), and by our 

discussion in Section 12.3.31 we could talk about microcanonical ground state energies instead of 
restricted multicuts. 

6.7 Limits of other dense combinatorial structures 

Limit objects can be defined for multigraphs, directed graphs, colored graphs, hypergraphs etc. 
In many cases, like directed graphs without parallel edges, or graphs with nodes colored with a 
fixed number of colors, this can be done along the same lines as for simple graphs. 

But in other cases there are some surprises. For example, limits of multigraphs with edge- 
multiplicities are not real valued functions, but 2-variable functions whose values are random 
variables with nonnegative integral values [5S]. If W is such a function, we can generate a 
VF-random multigraph by selecting n independent random points Xi , . . . , Xn from the uniform 
distribution on [0,1], and then connecting nodes i and j with W{Xi, Xj) parallel edges (which 
is a random integer). 

The case of hypergraphs is much more interesting and important. Formulating regularity 
lemmas and constructing limits of sequences of r-uniform hypergraphs, where r is fixed, is a 
highly nontrivial task, but it is essentially solved now, thanks to the work of Rodl and Skokan 
and Gowers [101 [5^; see also [TTilli^. 

However, it seems that no good extension of the distance Su has been found to hypergraphs 
(just as for the regularity lemma, the first natural guesses are wrong). Another open question is 
to extend these results to nonuniform hypergraphs, with unbounded edge-size. 

The semidefiniteness conditions for homomorphism functions can be extended to hypergraphs 
(see e.g. [80]). One area of applications of these conditions is extremal graph theory, and it is 
natural to ask if the semidefiniteness conditions can be useful in extremal hypergraph theory, 
especially since extremal problems for hypergraphs tend to be much harder than for graphs, and 
even basic questions are unsolved. 

7 Convergence and limits II: bounded degree graphs 
7.1 Neighborhood sampHng 

Recall the sampling process for bounded degree graphs: For a fixed nonnegative integer r, we 
select uniformly a random node v e V{G), and return the ball Bq(v, r) with center v and radius 
r (i.e., the subgraph induced by those nodes that can be reached from v on a path of length r 
or less). For a given rooted graph F, we denote by pG,r{F) the probability that this sampling 
method returns F (with the root as the center). So pG,r{F) defines a probability distribution 
on rooted graphs F with radius at most fc, which we denote by PG,r- 
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We use this method if the degrees of nodes in G are bounded by a fixed number d; then the 
number of possible neighborhoods is finite. 

A sequence of graphs (G^) with degrees uniformly bounded by d and |F(G'„)| ^ oo is 
convergent (or more precisely locally convergent) if the neighborhood densities pG^.r{F) converge 
for all r and all finite rooted graphs F. 

Similarly as for the subgraph sampling, there are equivalent density type parameters whose 
convergence could be used instead of the neighborhood densities, for example, we could stipulate 
the convergence of s(F, Gn) for every connected graph F. 

7.2 Local (weak) limit 

7.2.1 Different forms 

A weakly convergent bounded degree graph sequence has several, not quite equivalent limit 
objects, which we have introduced in Section 13.21 Part (a) of the following theorem is due to 
Benjamini and Schramm |16j : part (b) was formulated by R. Kleinberg (unpublished); part (c), 
which implies (b), is due to Elek [37]. 

Theorem 7.1 Let (G„) he a locally convergent sequence of graphs with degrees hounded by d. 
Then 

(a) There is a unique unimodular distribution r on countable rooted graphs with degrees 
bounded by d such that pGn,r ^ Pt- 

(b) There is a measure preserving graph G such that PG„.r — > PG.r for every k > 1. 

(c) There is a graphing G such that PG„,r PG,r for every k > I. 

Note that in (b) we don't claim uniqueness. We could replace "graphing" by "measure 
preserving graph" . 

A big difference from the dense case is that there does not seem to be any easy way to 
construct a sequence that converges to a given graphing in this sense. 

Conjecture 7.2 (Aldous— Lyons) Every graphing is the limit of a convergent sequence of 
bounded-degree graphs. Equivalently, every unimodular distribution on rooted countable graphs 
with bounded degree is the limit of a bounded degree graph sequence. 

7.2.2 Is the limit informative enough? 

The problem of the Regularity Lemma is related to conjecture [721 Indeed, suppose that we have 
a constructive way of finding, for an arbitrarily large graph G with bounded degree, a graph H 
of size bounded by a function of r and e that approximates the distribution of r-neighborhoods 
in G with error e. The same construction should also work with a graphing instead of G. Letting 
r oo and e — s- 0, this would give a sequence of finite bounded degree graphs converging to the 
given graphing. 

Part of the problem is to recognize "globally" when if is a good approximation of G. Is there 
a good notion of "distance" (analogous to 5\j) for graphs with bounded degree? 
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The limit graphon of a dense sequence of graphs contains very much information about the 
asymptotic properties of the sequence. This is not so for the dense case, unfortunately. 

Problem 7.3 Is there a notion of convergence for graphs with bounded degree that is stronger 
than Benjamini-Schramm? (For example, one should be able to read off from the limit that the 
graphs are expanders.) 

Let us illustrate this by a couple of simple examples. 

Example 7.4 Let (G„) be a sequence of 3- regular bipartite expander graphs with their girth 
tending to infinity. Let Hi consist of two disjoint copies of Gi. The Benjamini-Schramm limit 
of both sequences is a distribution concentrated on a single 3-regular rooted tree. In the Elek 
description, we get a graphing (fi, Ti, T2, Ts), where Ti,T2 and generate a free group which 
acts on 51 without fixed points. 

This limit graphing is not uniquely determined. One feels that in the case of the limit of 
the sequence (G„), the action of the free group should ergodic, while in the case of the i?„, 
should split into two invariant subsets of measure 1/2. So it appears that in the limit object, 
the underlying cr-algebra also carries combinatorial information. This is in stark contrast with 
the dense case [26] . 

Example 7.5 Let Gn denote the n x n grid. The Benjamini-Schramm limit object is a proba- 
bility distribution concentrated on the infinite grid with a specified root (the "origin"). A limit 
graphing can be described as the uniform measure on the 2-dimensional torus, together with the 
rotations by an irrational number a in one coordinate and the other. 

However, in many respects the "right" limit object of the sequence of grids is a solid square. 
In other words, instead of larger and larger pieces of the infinite grid, we consider finer and finer 
subdivisions of the unit square. 

This last example suggests that we can consider our graphs "on a different scale" , and study 
them as metric spaces with the usual graph distance as metric, normalized by the diameter. We 
can then consider the limit of these metric spaces in the sense of Gromov [55]. For example, the 
limit of a sequence of larger and larger square grids in this sense is a (full) square. This global 
structure is not revealed by the Benjamini-Schramm limit. 

It is easy to construct examples where the interesting structure of the graphs appears on 
an intermediate scale. It would be very interesting to describe and possibly unify limit ob- 
jects belonging to different scales. Perhaps we can we understand different limit objects using 
ultraproducts, similarly to the work of Elek and Szegedy in the dense case. 

7.3 Convergence from the right 

While the description of convergent sequences in the bounded degree case lacks some of the key 
results that hold in the dense case, most notably a good notion of distance, we can formulate 
a result (Borgs, Chayes, Kahn and Lovasz [25]) which shows that convergence defined in terms 
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of homomorphisms from the left and homomorphisnis to the right are equivalent under some 
circumstances. 

To state this, let us define for every simple graph G and weighted graph H the quantity 



To see the meaning of u{G, H), consider the case when H is simple. Then hom(G, H) < q\^i^)\ , 
and so after taking the logarithm and dividing by we get a number less than q. So 

u{G, Kq) expresses the freedom (entropy) we have in choosing the image of a node v G V{G) in 

a homomorphism G ^ H. 

For a weighted graph H, we define and 



Theorem 7.6 Let (Gn) be a sequence of graphs with maximum degree at most d. 

(a) // (G„) is convergent, then for every weighted graph H be a weighted graph with D{H) < 

the sequence u{Gn, H) is convergent. 

(b) Assume that for every q > 1 there is an Eg > such that for every weighted graph H 
on q nodes with D{H) < Sq the sequence q{Gn,H) is convergent. Then the sequence {Gn) is 
convergent. 

In the special case H = Kq is the complete graph on q nodes (without loops), we have 
D{Kq) = 1/(7, and hom{G,Kq) is the number of g-colorings of G. So it follows that if {Gn) is 
convergent and q > 2d, then the number of g-colorations grows as c'^^*^")' for some c. It is easy 
to see that some condition on q is needed: for example, if G„ is the n-cycle and q = 2, then 
q{Gn,K2) oscillates between — cxd and « as a function of n. 



What can we learn about a huge graph G from sampling? There are two related, but slightly 
different ways of asking this question, property testing and parameter estimation. 

8.1 Sample concentration 

Before discussing these tasks, let us address the following concern: if we take a bounded size 
sample from a graph, we can see very different graphs. For a random graph, for example, we 
can see anything. The natural way to use the sample G[S] is to compute some graph parameter 
f{G[S]). But this parameter can vary wildly with the choice of the sample, so what information 

do we get? 

The following two theorems assert that every reasonably smooth parameter of a sample is 
highly concentrated. (Note: we don't say anything here about the value of the parameter on the 
whole graph.) 

The first version applies to parameters where smoothness is defined locally. The proof depends 
on the theory of martingales (Azuma's Inequality). 



u{G,H) = 



loghom(G,ff) 

\nGj\ 




8 Testing 
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Theorem 8.1 Let f be a graph parameter and assume that \,f{G) — f{G')\ < 1 for any two 
graphs on the same node set which differ only in edges incident with a single node. Then for 
every graph G and I < k < \V{G)\ there is a value fo such that if S C V{G) is a random 
k-subset, then for every t > 0, 



with probability at least 1 — e *. 

The second version applies to parameters which are smooth with respect to our global distance 
function. The proof follows from a modification of the proof of Theorem 16.61 

Theorem 8.2 Let f be a graph parameter and assume that |/(G) — f{G')\ < ^□(G', G") for any 
two graphs on the same node set. Then for every graph G and I < k < \V{G)\ there is a value 
/o such that if S Q y{G) is a random k-subset, then 



with probability at least 1 — 2 . 

8.2 Parameter estimation 

We want to determine some parameter of a very large graph G. Of course, we'll not be able to de- 
termine the exact value of this parameter; the best we can hope for is that if we take a sufficiently 
large sample, we can find the approximate value of the parameter with large probability. 

To be precise, a graph parameter / is testable, if for every e > there is a positive integer k 
such that if G is a graph with at least k nodes and we select a set X oi k independent uniform 
random nodes of G, then from the subgraph G[X] induced by them we can compute an estimate 
g{G[X]) of / such that 



It is an easy observation that we can always use g{G[X]) = f{G[X]) (cf. [57^). 

It is easy to see that testability is equivalent to saying that for every convergent graph se- 
quence (G„), the sequence of numbers (/(G„)) is convergent. (So graph parameters of the form 
t{F, .) are testable by the definition of convergence.) This is, however, more-or-less just a refor- 
mulation of the definition. Paper |29^ contains a number of more useful conditions characterizing 
testability of a graph parameter. We formulate one, which is perhaps easiest to verify: 

Theorem 8.3 A graph parameter f is testable if and only if the following three conditions hold: 

(i) For every e > there is an e' > such that if G and G' are two simple graphs on the 
same node set and dfj{G,G') < e' then |/(G) — /(G')| < e. 

(ii) For every simple graph G, f{G{m)) has a limit as m oo. (Recall that G{m) denotes 
the graph obtained from G by blowing up each node into m twins. ) 

(iii) If G^ is obtained from G by adding a .single isolated node, then f{G~^) — f{G) — > i/ 




|/(G[5])-/o| < 



P{\f{G)-g{G[X])\>e)<e. 



\V{G)\^^. 
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Note that all three conditions are special cases of the statement that 
(iv) zf \V{Gn)\, |F(G;J| oo and <5n(G„, G'J ^ 0, then f{Gn) - f{G'J ^ 0. 
This condition is also necessary, so it is equivalent to its own three special cases (i)-(iii) in 
the Theorem. 

Example 8.4 As a basic example, consider the density of maximum cuts (recall Section B.3.2p . 
One of the first substantial results on property testing [5S1[T2] is that this parameter is testable. 
It is relatively easy to see (using high concentration results like Azuma's inequality) that if is a 
sufficiently large random subset of nodes of G, then maxcut(G[iS']) > maxcut(G) — e: a large cut 
in G, when restricted to S, gives a large cut in G[S]. It is harder, and in fact quite surprising, 
that if most subgraphs G[S] have a large cut, then so does G. This follows from Theorem 18.31 
above, since conditions (i)-(iii) are easily verified for / = maxcut. 

Example 8.5 The free energy (ITBl) for a fixed weighted graph H is a more complicated example 
of a testable parameter, which illustrates the power of Theorem l8.3l It is difficult to verify directly 
either the definition, or say condition (iv). The theorem splits this into three: condition (i) is 
easy by the definition of (in(G, G'); (ii) is a matter of classical combinatorics, counting mappings 
that split the twin classes in given proportions; finally, (iii) is trivial. 

8.3 Dense property testing 

Instead of estimating a numerical parameter, we may want to determine some property of G: 
Is G 3-colorable? Is it connected? Does it have a triangle? The answer will of course have 
some uncertainty. A precise definition was given by Rubinfeld and Sudan [101] and Goldreich, 
Goldwasser and Ron [56] . In the slightly different context of "additive approximation" , closely 
related problems were studied by Arora, Karger and Karpinski [12] (see e.g. |45] for a survey). 
Many extensions deal with situations where we are allowed to sample more than a constant 
number of nodes of the large graph G; our concern will be the original setup, where the sample 
size is bounded. 

A graph property V is testable, if there exists another property V' (called a "test property" ) 
such that 

(a) if a graph G has property V, then for all 1 < fc < |V^(G)| at least 2/3 of its fc-node induced 
subgraphs have property V' , and 

(b) for every e > there is a fee > 1 such that if G is a graph whose edit distance from V is 
at least e|y(G)p, then for all < k < \V{G)\ at most a fraction of 1/3 of the fc-node induced 
subgraphs of G have property V . 

This notion of testability is usually called oblivious testing, which refers to the fact that no 
information about the size of G is assumed. The constants 1 /3 and 2/3 are arbitrary, and it would 
not change the notion of testability if we replaced them by any two real numbers < a < 6 < 1. 

It is surprising that this rather restrictive definition allows many testable graph properties: 
for example, bipartiteness, triangle- freeness, every property definable by a first order formula 
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A surprisingly general result was proved by Alon and Shapira [5^. A graph property V is 
called hereditary ii G E V implies that G' E V for every induced subgraph G' of G. 

Theorem 8.6 (Alon— Shapira) Every hereditary graph property is testable. 

Fischer and Newman [46j proved that a property is testable if and only if the normalized 
edit distance from the property a testable parameter. Alon at al. characterized testable graph 
properties in terms of Szemeredi partitions [7]. 

Going to the limit gives a tool of studying testability in a "cleaner" form (Lovasz and Szegedy 
[88]). It turns out that this leads to an interesting interplay between the cut-norm and the Li- 
norm on Wq. 

A graph property V can be thought of as a subset of Wq (through the correspondence G s- 
Wg), and we can consider its closure V in the metric space {Wo,V). For example, the closure 
of the set of triangle-free graphs is the set of triangle-free graphons, which can be characterized 
by the property t{K^, W) — 0. More generally, let 7^ be a hereditary graph property. Then its 
closure is characterized by the (infinitely many) equations 

tind{F, W) = Q for aU F (/.V. (38) 

Closures of testable graph properties will be called testable graphon properties. These graphon 
properties can also be characterized in terms of a sampling method: we consider the VF-random 
graph G(fc, W) as the sample of size k from W . 

Theorem 8.7 A graphon property TZ is testable if and only if there is a graph property TV such 
that 

(a) Pr((G(fc, W) G TV) > 2/3 for every function W E TZ and every k>\, and 

(b) for every e > there is a > 1 such that Pr(G(fc, W) £ TZ') < 1/3 for every k > and 
every function W G Wo with di{W,TZ) > e. 

We quote an analytic characterization of testable graphon properties [55]. Recall that the 
distances di and do are related trivially by d\j < di. Testability of a property concerns an 
inverse relation: 

Theorem 8.8 A graphon property TZ is testable if and only if either one of the following condi- 
tions hold: 

(a) For every e > there is an e' > such that if dci{W,TZ) < e' for some graphon W, then 
di{W,TZ) < e. 

(b) di{W,TZ) is a continuous function of W in the cut norm. 

Condition (b) can be viewed as the graphon analogue of the theorem of Fischer and Newman 
mentioned above (and the finite theorem can be derived from it). Condition (a) is a special case 
of (b). 
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Example 8.9 Let TZ = {U}, where U E W is the identically 1/2 function. Clearly this property 
is invariant under weak isomorphism. Consider the random graphs Gn — G(n, 1/2); then |iWG„ ~ 
^lln ~^ with probability 1, but \\Wg„ —U\\i — 1/2 for every n. So this property is not testable 
by Theorem iH 

Let us sketch how the graphon version of Theorem 18.61 follows from this. A property TZ of 
functions W S Wo is called flexible if for every function U such that U{x,y) = W{x,y) for all 
x,y with W{x,y) € {0, 1}, we also have U E TZ. First, one proves that 

Lemma 8.10 The closure of a hereditary property is flexible. 

Indeed, each of the equations ([55]) is preserved if we change the value of W at points where 
this value is positive. 

Next, we assume that 7?. is a closed flexible property which is not testable. By Theorem 18. 81 
there is a sequence of functions Wn such that dn(W„, TZ) ^ but di{Wn,TZ) > e for some fixed 
e > 0. By Theorem 14.21 we may assume that Wn converges to some W &TZ va the norm. 
Let S'o = W~^(0) , 5*1 = W^^{1) and let Z„ e Wq denote the function which is 1 on 5i, on 

and is identical with Wn anywhere else. By flexibility, we have Z„ S TZ, and by (j34p . 

\\Wn-Zn\\l= I Wn + I {l~Wn)-^ I W + I {l-W) = Q (n^CX)), 

and so di {Wn , 7?.) — > 0, a contradiction. So it follows that the closure of every hereditary property 
is testable. 

From this, one can derive that hereditary properties are testable. There is some further 
arguments needed, since a graph property can have a testable closure without itself being testable. 
(An example is the property that the graph is complete if the number of nodes is even but edgeless 
if the number of nodes is odd.) One can add further conditions that lead to a characterization, 
but we don't go into these technical issues here. 

8.4 Sparse property testing 

We say that a graph property T' is testable for graphs in Qd if for every e > there are integers 
r = r{d,e) > 1 and k — k{d,e) such that sampling k neighborhoods of radius r from a graph G 
with degree bounded by d, we can compute "YES" or "NO" so that: 

(a) if we answer "NO", then G ^ P; 

(b) if we answer "YES", then we can change at most e|V^(G)| edges in G to get a graph in P. 
An important analogue of the result of Alon and Shapira discussed above is the following 

theorem of Benjamini, Schramm and Shapira [17 . We must recall a fundamental notion from 
graph theory: a minor of a graph G is any other graph obtained from G by deleting edges 
and/or nodes, and contracting edges. A graph property is minor-closed, if it is preserved by 
these operations. Planarity of a graph is an example of a minor-closed property. 

Theorem 8.11 Every minor-closed property is testable for graphs with bounded degrees. 
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A related result was proved by Elek [55] : 

Theorem 8.12 // a graph property is preserved by edge/node deletion and disjoint union, then 
it is testable for graphs with bounded degrees and subexponential growth. 

9 Extremal graph theory 
9.1 Some classical results 

In this section we describe applications of the theory of graph homomorphisms and graph limits 
to extremal graph theory. As an introduction, let us recall some classical results. 

Define the Turdn graph T{n,r) {1 < r < n) as follows: we partition [ri] into r classes as 
equitably as possible, and connect two nodes if and only if they belong to different classes. 

Theorem 9.1 (Turan's Theorem) Among all graphs on n nodes containing no Kk, the graph 
T{n, k — 1) has the maximum number of edges. 

Since we are interested in large n and fixed k, the complication that the classes cannot be 
exactly equal in size (which causes the formula for the number of edges of T{n, fc — 1) to be a 
bit ugly) should not worry us. We will be interested in the following corollary: 

Corollary 9.2 // a graph on n nodes has more than C^^^) (^rrr)^ edges, then it contains a Kj^. 

The case fc = 3 was proved by Mantel before Turan. We will use this case to illustrate the 
ideas, but the general case could be treated similarly. 

One can ask for not just the existence of complete /c-graphs, but for their number. General- 
izing Turan's Theorem, the following lower bound was proved by Goodman (for fc = 3) and by 
Moon and Moser. 

Theorem 9.3 If a graph on n nodes has 0(2) edges (0 < a < 1), then it contains at least 
a{2a — 1) . . . ((fc — 2)a — fc + 1)(^) complete k-graphs. 

This bound is tight for Turan graphs, but their edge density attains only certain values of a. 
The best lower bound in terms of a and n is quite complicated. To illustrate these complications, 
we represent each graph G by the points {t{K2,G),t{Kz,G) in the unit square (see Figure [2]). 
The lower bounding curve consists of infinitely many concave cubic arcs, and its validity was only 
recently proved by Razborov 98J . This was extended to the best lower bound on the number of 
-ft'4-s by Nikiforov [95], but even the edge-Kq diagram is only conjectural [83] for g > 5. 

One can also ask for an upper bound on the number of complete fc-graphs in a graph with 
given number of edges. A special case of the Kruskal-Katona Theorem answers this (the whole 
theorem gives the precise value, not just asymptotics, and concerns uniform hypergraphs, not 
just graphs). 

Theorem 9.4 // a graph on n nodes has 0(2) edges (0 < a < 1), then it contains at most 
^'^^^(fc) complete k-graphs. 
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edge density 



Figure 2: Possible edge and triangle densities of a graph 

Asymptotic equality is attained when the graph consists of a clique and isolated nodes. Not 
every edge density a can be realized by such graphs, but the attainable edge densities are dense 
in [0, 1], and so Theorem 19.41 is asymptotically tight for all values of a. 

Instead of counting complete graphs, we one can consider the number of copies of some other 
graph F in G. We have already come across counting 4-cycles twice: in Section 11.4.31 and in 
Section [1.5.4l Giving just the simpler asymptotic version: 

Theorem 9.5 (Erdos) // a graph on n nodes has 0(2) (^dges (0 < a < 1), then it contains at 
least (i + o(l))a'*n^ A-cycles. 

Graphs with asymptotic equality here are quasirandom graphs. 

The number of paths of length fc is a more difficult question, but it turns out to be equivalent 
to a theorem of Blaklcy and Roy 18J in matrix theory. Again asymptotically. 

Theorem 9.6 If a graph on n nodes has 0(2) edges (0 < a < 1), then it contains at least 
(i + o{l))a''^^n'' paths of length k. 

Regular graphs give asymptotic equality here. 
9.2 Algebraic proofs of extremal graph results 

The classical extremal problems in the previous section can be expressed as algebraic inequalities 
between the subgraph densities t{F, W) that hold for all graphons W. Often "going to the 
infinity" provides cleaner formulations (no error terms). Here are a few examples: 

Example 9.7 

(a) Turdn's theorem. We state just the case of triangles (due to Mantel): 

tiK3,W)=0^tiK2,W)<l/2, (39) 
which follows from the algebraic inequality due to Goodman [58] : 

t{K3, W) > t{K2,W){2t{K2, W) - 1). (40) 
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(b) The Kruskal-Katona theorem for graphs: 



tiK3,W)<t{K2,Wf^^. 



(41) 



(c) Erdos's bound on the number of quadrilaterals: 



tiCi,W)>tiK2,W)^. 



(42) 



(d) The Blakley-Roy inequality: 



tiPk,W)>t{K2,W) 



fe-i 



(43) 



(e) The Sidorenko Conjecture (unsolved) generalizes the last two results in the direction that 
for every bipartite graph F, 



This conjecture is proved for trees, many small graphs, complete bipartite graphs (Sidorenko 
|107j ) and also for cubes (Hatami [M]). 

Using the formalism introduced above, the results in example 19 . 71 can be expressed as follows: 



(a) > 2X2' 

(b) Ki" > ifa'; 

(c) C4 > if 2'; 

(c) P4 > ^^2'; 

(d) F>is:2'^^^^' (if F is bipartite). 



The first three inequalities can be proved easily using the reflection positivity of the graph 
parameters t{.,W). We will illustrate the method by deriving (a) through formal algebraic 
manipulations. 

Proof of (a) (Goodman's extension of the Mantel-Turan Theorem). Let F denote the graph 
K2K1 (an edge and an isolated node), and let Fi, F2 and F3 be obtained from F by labeling all 
three nodes, one endpoint of the edge, and the isolated node, respectively. Consider the quantum 
graph Fi^ + 2{F2 — ^3)^, which is obviously nonnegative. Unlabeling the nodes and deleting 
isolated nodes, we get A'3 — 2X2^ + K2, which is thus nonnegative (see Figure [3]). 

Of the above inequalities, also (b) and (c) can be proved by similar arguments. The Blakley- 
Roy inequality (c) is more difficult, but some extension of this kind of argument does work 
|74j . Sidorenko's conjecture (d) would of course be very nice to prove this way (or by any other 
means) . 

Using related methods, Razborov |^ solved the long-standing problem of characterizing the 
possible (edge-density, triangle-density) pairs, which in this setting means a description of the 
set it{K2, W),t{K3, W)) : W G Wo) by algebraic inequalities. 

The inequality in (c) also follows from reflection positivity if k is even. It is not known whether 
(c) for odd k (or perhaps every valid algebraic inequality between subgraph densities) follows 



t{F,W)>t{K2,W)\^'^^'>l 



(44) 
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Figure 3: A computation proving the Mantel- Turan Theorem 



from a finite number of semidefiniteness inequahties. However, every vahd hnear inequahty 
between homomorphism densities foUows from semidefiniteness constraints (equivalently, from 
"sums of squares" computations in graph algebras), as we shall see in the next section. 

9.3 Positivstellensatz for graphs and spectral norms 

The machinery introduced in the previous sections allows us to suggest a very general approach 
to extremal graph theory. 

We can define the following partial order on Qq: we say that a quantum graph a; > 0, if 
t{x, W)>0 for all W G Wq. 

Let us call a quantum graph y a square-sum if there are fc-labeled quantum graphs yi, . . . ,yk 
for some k such that y can be obtained from J^i Ui by forgetting the labels. It is easy to see 
that every square-sum satisfies y > 0- 

As an example, recall the definition (jl8p of the "inclusion-exclusion" quantum graph F. Let 
us label all nodes of F, square it, and then forget the labels: we obtain F itself. This implies 
that F > for all F. In the special case when W = Wg for some graph G, this also follows from 
our previous remark that t{F,G) is a probability, and hence nonnegative. 

Is there a quantum graph a; > which is not a square sum? I suspect that such quantum 
graphs exist, but it might be diSicult to prove this property. However, the following weaker 
result can be proved [91] . 

Theorem 9.8 Let x be a quantum graph. Then x > if and only if for every e > there is a 
square-sum y such that N{y) < N{x) and \\x — y\\2 < s. 

The proof depends on the duality theory of semidefinite programs. Note that we do not claim 
that the fc-labeled quantum graphs yi in the square-sum representation of y also have bounded 
N{yi); the proof gives arbitrarily large graphs if e is small. 

In analogy with the Positivstellensatz for real polynomials, we may try to represent quantum 
graphs a; > as quotients of square-sums; if y and z are square-sums and y = zx -\- x, then 
a; > 0. 

We mention a couple of related questions. For every even positive integer k, the functional 
t{Ck,Wy^'' defines a norm on W (the Neumann-Schatten norm). This suggests the question: 
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For which other simple graphs F is t{F, a norm (or seminorm) on W? Hatami [M] 

proved that if a simple graph F has the property that = t{F, is a norm, then 

it satisfies Sidorenko's conjecture 19. 7f d). He also proved that all cubes have this property. 

In view of the usefulness of extending graphs to graphons, it seems natural to define graph 
algebras of infinite linear combinations of graphs with appropriate convergence properties. It is 
not worked out, however, what the structure of the resulting algebra is, and how it is related to 
graphons. 

9.4 The maximum distance from a hereditary graph property 

A surprisingly general result is the theorem of Alon and Stav [9], proving that for every heredi- 
tary property, a random graph with appropriate density is asymptotically the farthest from the 
property in edit distance. The analytic results developed in this paper allow us to state and 
prove a simple analytic analogue of this fact, from which the original result follows along with 
generalizations . 

Theorem 9.9 (Alon and Stav) For every hereditary graph property V there is a number p, 
< p < 1, such that for every graph G with \V{G)\ = n, 

di{G,V) < E{di{G{n,p),V))+o{l) (n ^ oo). 

The following theorem [88] states a graphon version of this fact. 

Theorem 9.10 IfTZ is the closure of a hereditary graph property, then the maximum ofdi{.,TZ) 
is attained by a constant function. 

Our point in giving this generalization is to illustrate the power of extending graph problems 
to a continuum. The key observation is the following, which follows from Lemma [8. 101 

Lemma 9.11 IfTZ is the closure of a hereditary graph property, then the set Wq \TZ is convex. 

Hence it follows that the di distance from T' is a concave function on Wo \ TZ. Since Wo \ TZ 
is obviously invariant under the group of invertible measure preserving transformations of [0, 1], 
it is not hard to argue that there is a point (graphon) in Wo \ TZ maximizing the distance from V 
which is invariant under these measure preserving transformations, and so it must be a constant 
function. 

9.5 Which graphs are extremal? (Finitely forcible graphons) 

We call a graphon W E Wq finitely forcible if there exist a finite list of graphs Fi , . . . , Fm and 
real numbers ai, . . . , Om such that the equations t{Fi, U) = ai, . . . , t(Fm, U) — a,„ are satisfied 
by precisely those functions U G Wq which arise from W by measure preserving transformations. 
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Let us consider a very general type of graph theoretic extremal problem: 



maximize t(f, W) 
subject to t{gi, W) 
t{92,W) 



(45) 



where f,gi,...,gk are given quantum graphs. Most of the graphon versions of extremal problems 
discussed so far fit in this scheme. 

It is easy to see that every finitely forcible graphon is the solution of an extremal problem of 
the type We conjecture the following converse: 

Conjecture 9.12 Every extremal problem has a finitely forcible optimum. In other words, if a 
finite set of constraints of the form t{Fi, W) ~ Ui is satisfied by some graphon, then it is satisfied 
by a finitely forcible graphon. 

This may seem far fetched, but the following heuristic supports it. Suppose that t{Fi, W) = 
ai, . . . ,t{Fk, W) = flfc has a solution in W, but this is not forced by these constraints. Then 
there is a graph F such that t{F, W) is not determined, i.e., a — mmt{F, W) < maxt{F, W) = b 
(the max and min are taken over all solutions W of the system). Now add one of the conditions 
t{F, W) = a or t{F, W) = b to the system and repeat. It seems that in very few (2-3) steps we 
always get a unique solution, i.e., a finitely forcible graphon. 

Almost all classical extremal problems have a solution that is a stepfunction. It was shown by 
Lovasz and Sos 84J that every stepfunction is finitely forcible, and it was conjectured that these 
are the only ones. Recently B. Szegedy and Lovasz |90] found other finitely forcible graphons, 
and so the problem of characterizing finitely forcible graphons is wide open. 

We mention two examples of finitely forcible graphons that are not stepfunctions (the proof 
is not quite easy). 

Example 9.13 Let p{x, y) is a symmetric real polynomial that is monotone increasing on [0, 1]^. 
Define 



Then W is finitely forcible. It is conjectured that monotonicity is not needed here. 

In contrast, one can show that if W{x,y) is a polynomial in x and y (not a function of the 
sign), then it is not finitely forcible. 

Example 9.14 Let 





0, 



1, 



if the first bit where the binary expansions of x and y differ 

is at an odd position, 

otherwise. 



The W is finitely forcible. 
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